New to SAS. I know the following codes are creating a macro variable that stores a list of variables names, but what do : and | mean?
%let v_lst = a b bb: t_v129 |
c tt: t_v16 t_v275 |
d: t_v56 |
;
The bar | has no fixed meaning. It is probably used as a delimiter. The macro variable is later split in subtrings delimited by |. This is often done using the %scan function and represents a way of list processing.
The colon indicates a prefix. bb: - all variables starting with bb. Many SAS PROC and the datastep can process variable lists this way.
You can put anything in macro variables, and what counts is what you do with it next. Now as a convention, the | symbol is conveniently used as a field/value separator, while the colon has no clear "conventional" use that I know of. Depending on the context, it could mean that values on its left (columns/variables) are to be associated to values to its right (other columns maybe). But you'd really need to look further down the code and look for loops using &v_lst, probably along with scan() or %scan() functions.
Related
Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;
I need to use a local macro to loop over part of a variable name in Stata.
Here is what I tried to do:
local phth mep mibp mbp
tab lod_`phth'_BL
Stata will not recognize the entire variable name.
variable lod_mep not found
r(111);
If I remove the underscore after the `phth' it still does not recognize anything after the macro name.
I want to avoid using a complicated foreach loop.
Is there any way this can be done just using the simple macro?
Thanks!
Your request is a bit confusing. First, this is precisely the purpose of a loop, and second, loops in Stata are (at the "introductory level") quite simple. The following example is a bit nonsensical (and given the structure, there are easier ways of going about this), but should convey the basic idea.
// set up a similar variable name structure
sysuse auto , clear
rename (price mpg weight length) ///
(pref_base1_suff pref_base2_suff pref_base3_suff pref_base4_suff)
// define a local macro to hold the elements to loop over
local varbases = "base1 base2 base3 base4"
// refer to the items of the local macro in a loop
foreach b of local varbases {
summ pref_`b'_suff
}
See help foreach for the syntax of foreach. In particular, note that the structure employed above may not even be required due to Stata's varlist structure (see help varlist). For example, continuing with the code above:
foreach v of varlist pref_base?_suff {
summ `v'
}
The wildcard ? takes the place of one character. * could be used for more flexibility. However, if your variables are not as easily identifiable using the pattern matching allowed by varlist, a loop as in the first example is simple enough -- four very short lines of code.
Postscript
Upon further reflection (sometimes the structure of the question anchors a certain method, when an alternative approach is more straightforward), searching the help files for information on the tabulate command (help tabulate) will direct you to the following syntax: tab1 varlist [if] [in] [weight] [, tab1_options]
Given the discussion above about the use of varlists, you can simply code
tab1 lod_m*_BL
assuming, of course, that there are no other variables matching the pattern for which you do not want to report a frequency table. Alternatively,
tab1 lod_mep_BL lod_mibp_BL lod_mbp_BL
is not much longer and does the trick, albeit without the use of any sort of wildcard or macro substitution.
I have a sas macro which is dependent on two varying variables Age and Year represented by &Age and &Year respectively.
I want to run the macro for each combination of the ages 15-18 and years 2007-2010 and wish to create an output table for each of these combinations (i.e. 16 tables in total).
Here is my problem, I try to give the output table the following name in the macro Matrix_pop_adm&Age_RP&Year which SAS doesn't like.
How can I name my output file correctly ?
As you're concatenating macro variables into a string with other delimiting characters, you need to add dots to make each distinct macro variable resolve (in this case) independently.
Matrix_pop_adm&Age._RP&Year
/* ^ force &AGE to resolve, instead of trying to resolve &Age_RP */
I'm trying to use PROC FREQ on a subset of my data called dataname. I would like it to include all rows where varname doesn't equal "A.Never Used". I have the following code:
proc freq data=dataname(where=(varname NE 'A.Never Used'));
run;
I thought there might be a problem with trailing or leading blanks so I also tried:
proc freq data=dataname(where=(strip(varname) NE 'A.Never Used'));
run;
My guess is for some reason my string values are not "A.Never Used" but whenever I print the data this is the value I see.
This is a common issue in dealing with string data (and a good reason not to!). You should consider the source of your data - did it come from web forms? Then it probably contains nonbreaking spaces ('A0'x) instead of regular spaces ('20'x). Did it come from a unicode environment (say, Japanese characters are legal)? Then you may have transcoding issues.
A few options that work for a large majority of these problems:
Compress out everything but alphabet characters. where=(compress(varname,,'ka') ne 'ANeverUsed') for example. 'ka' means 'keep only' and 'alphabet characters'.
UPCASE or LOWCASE to ensure you're not running into case issues.
Use put varname HEX.; in a data step to look at the underlying characters. Each two hex characters is one alphabet character. 20 is space (which strip would remove). Sort by varname before doing this so that you can easily see the rows that you think should have this value next to each other - what is the difference? Probably some special character, or multibyte characters, or who knows what, but it should be apparent here.
I have a problem at work: I have merged two datasets, and there is a number of variables which have the same content, but where an observation which has an value in the variable from dataset 1 have a missing-value in dataset 2. So I need to transfer the values from the one variable into the other one.
This is my best shot so far:
replace V23=1 if V232==1
replace V23=2 if V232==2
replace V23=3 if V232==3
replace V23=4 if V232==4
replace V23=8 if V232==8
replace V23=.u if V232==10 | V232==9
However, it is a tedious task to do that for 40+ variables - and since some of them are numerical variables, it becomes a a sisyphean task.
Here's a start:
foreach v of varlist v23 {
local w `v'2
replace `v' = `w' if missing(`v')
replace `v' = .u if `w' == 10 | `w' == 9
}
Notice how this solution relies on a lexical relationship among the variable names: it assumes the old variable "v23" is associated with the new variable "v232". You can make a list of such associations and use it, but this is inconvenient. It's probably easier to rename the variables, if necessary, to conform to such a convention, then run the replacement script, and then restore the desired names.
If you're unfamiliar with this kind of automation, read the help pages for macro and foreach.