It is a simple one but I'm a struggling a bit.
What I have :
What I want :
I want to remove the v0 , v1 and etc.
I'm using this piece of code
data IndieDay20140704;
set IndieDay20140704;
do i=1 to 5;
VAR1=tranwrd(var1,"v&i","");
end;
run;
It is not working correctly as it is giving me this instead (see below) plus the error
WARNING: Apparent symbolic reference I not resolved.
Questions:
1) Do I need a macro?
2) Why the error?
Many thanks for your insights.
There's an error because you're (unintentionally) using macro variable i, that you did not initialize.
I guess the idea of tranwrd is to remove words in VAR2, VAR3.. from VAR1.
The logical error is to do it also for VAR1 itself.
Check if this helps (using array):
data IndieDay20140704;
length VAR1 VAR2 VAR3 VAR3 VAR5 $10;
VAR1 = 'TEST IT';VAR5 = 'TEST';
output;
VAR1 = 'STEST IT';VAR5 = 'TEST';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
array vals VAR1 - VAR5;
do i=1 to dim(vals);
if i ne 1 then VAR1=tranwrd(var1,trim(vals(i)),"");
end;
drop i;
run;
Here I'm creating a SAS view on top of table (not a good idea to overwrite the source).
Also I think you should trim() the values from VAR2,VAR3... depending on what you want to achieve and what's in the data.
EDIT:
here the version with 'v0', 'v1'...'v5' strings:
data IndieDay20140704;
length VAR1$10;
VAR1 = 'TEST v0';
output;
VAR1 = 'TEST v11';
output;
VAR1 = 'TEST v1';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
org_var1 = var1;
do i=0 to 5;
var1 =tranwrd(var1, catt('v', put(i, 1. -L)),"");
end;
run;
catt('v', put(i, 1. -L)) concatenates string 'v' and the result of put.
put(i, 1. -L)) converts numeric variable i to text using plain numeric format w.d, 1. used here - enough for single digit numbers, -L left aligns the result
Here's one way, there are many others and this may not work if your data has a lot of variability.
data have;
length VAR1$10;
VAR1 = 'fic19v0.csv';
output;
VAR1 = 'fic19v1.cs';
output;
run;
data want ;
set have;
original_var=var1;
var1=substr(var1, 1, index(var1, ".")-3)||".csv";
run;
Related
I have a SAS dataset with 2 columns that I want to compare (VAR1 and VAR2). I would like to check if for each value of VAR1 this value exists anywhere in the column VAR2. If the VAR1 value does not exist anywhere in the column VAR2 I want to flag it as 1.
For exemple :
I have this :
TABLE in
VAR1
VAR2
k3
t7
t7
g7
p8
k3
...
...
And would want this
TABLE out
VAR1
VAR2
FLAG
k3
t7
0
t7
g7
0
p8
k3
1
...
...
...
I tried using
FLAG = ifn(indexw(VAR2,VAR1,0,1)
But this method only compare the two columns for the current row.
Thank you in advance for your help !
Edit : I tried running this code as suggested by Joe but ran into an error.
Code :
data your_table;
length VAR1 $2;
length VAR2 $2;
input VAR1 VAR2;
datalines;
k3 t7
t7 g7
p8 k3
;
data for_fmt;
set your_table;
fmtname = 'VAR2F';
start = var2;
label = '0';
output;
if _n_ eq 1 then do;
hlo = 'o';
start = .;
label = '1';
output;
end;
run;
proc sort nodupkey data=for_fmt;
by start;
run;
proc format cntlin=for_fmt;
quit;
data want;
set your_table;
flag = put(var1,var2f.);
run;
Error:
ERROR: This range is repeated, or values overlap: .- ..
In SAS, everything is based on one row at a time in the data step, so you can't do what you're looking to directly.
What you can do, though, is use a lookup technique - there are quite a few - and that will let you get what you're after.
The easiest one to use in your case is probably a format.
data for_fmt;
set your_table;
fmtname = 'VAR2F';
start = var2;
label = '0';
output;
if _n_ eq 1 then do;
hlo = 'o'; *this is for "other" (not found) records;
start = .;
label = '1';
output;
end;
run;
proc sort nodupkey data=for_fmt;
by start;
run;
proc format cntlin=for_fmt;
quit;
data want;
set your_table;
flag = put(var1,var2f.);
run;
This is pretty fast (only limited by dataset read/write time) unless you have millions of unique rows.
You could also merge the dataset to itself, or do this in SQL, or use a hash table, but the format approach is probably simplest.
As #Joe says, it is always on one row at a time in the data step. So if you can make all possible value of var1 in one row, you can do the character match work easily.
data your_table;
length VAR1 $2;
length VAR2 $2;
input VAR1 VAR2;
datalines;
k3 t7
t7 g7
p8 k3
;
data want;
array _char_[&sysnobs.]$200._temporary_;
do until(eof1);
set your_table end=eof1;
i+1;
_char_[i]=var2;
end;
do until(eof2);
set your_table end=eof2;
flag=1;
do i=1to dim(_char_) until(flag=0);
if var1=_char_[i] then flag=0;
end;
output;
end;
run;
Here's the method I typically use in situations like this. I first create a list of the variables to check against, then merge with that and can easily pick out the ones that are found.
proc sort data=have (keep=var2) out=var2levels (rename=(var2=var1)) nodupkey; by var2;
proc sort data=have; by var1;
data want;
merge have (in=in1) var2levels (in=in2);
by var1;
if in1;
flag = in2;
run;
So here the first proc sort creates a list of all the unique values of var2. The output data set renames that to var1 for merging purposes (this can be done more clearly but less efficiently by renaming multiple variables). Then we simply merge the original data set (keeping all records) with the list of existing var2 values and set the flag accordingly.
So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.
data example1;
input var1 var2 var3;
datalines;
10 11 14
3 5 8
0 1 2
;
data example2;
input var;
datalines;
1
2
8
;
Let's say that the number of var variables depending on data input. I want to put that number to macro variable and use in another data step, for example:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
%let array_dim = dim(var_array);
do i = 1 to &array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim; /* doesn't work */
run;
%mend;
%m(example1);
%let array_dim = dim(var_array); doesn't work in second data step, because dim(var_array) isn't evaluated, but %eval or %sysevalf in declaring the macro variable does't work here. How to do that correctly?
You are mixing up macro code and data step code in a way that is not supported in SAS. If you want to assign a macro variable a value that you're generating as part of a data step, you need to use call symput.
Also, if you create a macro variable during a data step, you cannot resolve it during the same data step in the way that you are attempting to do (unless you use the resolve function...). It's easier just to use a data set variable for this.
So here's a fixed version of your code that I think probably does what you want:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
array_dim = dim(var_array);
/*Only export the macro variable once, for the first row*/
if _n_ = 1 then call symput('array_dim_mvar', array_dim);
do i = 1 to array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim_mvar;
run;
%mend;
%m(example1);
I want to achieve the same output but instead of harcoding each of the array-element use something like var1 - var10 but that would jump by 10 like decades.
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
/* here and below something like:
array pop_diff_amount {10} pop_diff_amount_1920 -- pop_diff_amount_2010;*/
array pop_diff_amount {10} pop_diff_amount_1920 pop_diff_amount_1930
pop_diff_amount_1940 pop_diff_amount_1950
pop_diff_amount_1960 pop_diff_amount_1970
pop_diff_amount_1980 pop_diff_amount_1990
pop_diff_amount_2000 pop_diff_amount_2010;
array pop_diff_prcnt {10} pop_diff_prcnt_1920 pop_diff_prcnt_1930
pop_diff_prcnt_1940 pop_diff_prcnt_1950
pop_diff_prcnt_1960 pop_diff_prcnt_1970
pop_diff_prcnt_1980 pop_diff_prcnt_1990
pop_diff_prcnt_2000 pop_diff_prcnt_2010;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;
I am still beginner in it therefore I am not sure is this possible or easy to achieve.
Thanks!
Not automatic but not all that difficult either. First create a data set of the names then transpose and use an unexecuted set to bring in the names and then define arrays. Note how arrays are define using [*] and name: as you did with population_array.
data names;
do type = 'Amount','Prcnt';
do year=1920 to 2010 by 10;
length _name_ $32;
_name_ = catx('_','pop_diff',type,year);
output;
end;
end;
run;
proc print;
run;
proc transpose data=names out=pop_diff(drop=_name_);
var;
run;
proc contents varnum;
run;
data pop;
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
if 0 then set pop_diff;
array pop_diff_amount[*] pop_diff_amount:;
array pop_diff_prcnt[*] pop_diff_prcnt:;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
run;
proc print data=pop;
run;
SAS is automatically going to increment the array elements by 1. Here is an alternative solution that creates the variables using one extra step to create a set of macro variables that hold the desired variable names. Since you are basing them off of the variable POPULATION_<year>, we will simply grab the years from those variable names, create the variable names for the arrays that we want, and store them into a few macro variables.
proc sql noprint;
select cats('pop_diff_amount_', scan(name, -1, '_') )
, cats('pop_diff_prcnt_', scan(name, -1, '_') )
into :pop_diff_amount_vars separated by ' '
, :pop_diff_prcnt_vars separated by ' '
from dictionary.columns
where libname = 'SASHELP'
AND memname = 'US_DATA'
AND upcase(name) LIKE 'POPULATION_%'
;
quit;
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
array pop_diff_amount {*} &pop_diff_amount_vars.;
array pop_diff_prcnt {*} &pop_diff_prcnt_vars.;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;
Getting the data out of the meta data (create variable year) would make coding life easier.
proc transpose data=sashelp.us_data out=us_pop(rename=(col1=Population));
by statename;
var population_:;
run;
data us_pop;
set us_pop;
by statename;
year = input(scan(_name_,-1,'_'),4.);
pop_diff_amount=dif(population);
pop_diff_prcnt =(population/lag(population))-1;
format pop_diff_prcnt percent10.2;
if first.statename then call missing(of pop_diff_amount pop_diff_prcnt);
drop _:;
run;
proc print data=us_pop(obs=10);
run;
I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.