sas cleaning using if then statements - sas

I am trying to clean my data for language but when I print it is not cleaned. I have placed this under my set statement
here is my code:
if lang in ("spanish, Engglishish, ssanklish, England) then lang="English";
run;

If you only want to match a single string you can use equality operator instead of the IN operator.
if lang = "Engglish, Engglishish, Engl, England, English, Englishh, Englisj, Englsh, En, Inglish, Old, NewEnglish, Oldenglish, Onglish, english"
then lang="English"
;
If you want to match any of the substrings in your list then make each one its own string literal. Note that the IN operator in SAS is just as happy with spaces as commas between the items.
if lang in ("Engglish" "Engglishish" "Engl" "England" "Englishh" "Englisj" "Englsh"
"En" "Inglish" "Old" "NewEnglish" "Oldenglish" "Onglish" "english")
then lang="English"
;
Make sure to use that in a valid data step.
data want;
set have;
if lang in ("Engglish" "Engglishish" "Engl" "England" "Englishh" "Englisj" "Englsh"
"En" "Inglish" "Old" "NewEnglish" "Oldenglish" "Onglish" "english")
then lang="English"
;
run;
Do you really want to change Old to English?

Related

Convert Number with Format into a String

How do you convert a number or currency variable into a character string that keeps the format as part of the string?
For instance, the below code has a character variable, MSRP_to_text, and currency variable, MSRP. When I set MSRP_to_text equal to MSRP, it takes the unformatted number and converts it to a string, so the dollar sign and the comma are gone.
DATA want;
SET SASHELP.CARS(KEEP=MSRP);
ATTRIB MSRP_to_text FORMAT=$8.;
MSRP_to_text = MSRP;
RUN;
In other words, the code is currently converting $36,945 -> "36945", but what I really want is $36,945 -> "$36,945".
Is there a way to keep the dollar sign and comma in the string?
VVALUE function will retrieve the formatted value of a variable.
MSRP_as_text = VVALUE(MSRP);
VVALUEX goes one step further for the case of the variable name being dynamic; such as being stored in a different variable, or is computed from some name patterning algorithm.
name = 'MSRP';
formatted_value = VVALUEX(name);
Instead of ATTRIB statement, Use the PUT function to convert number to Character. and it will keep the text value with format. Since the original Format of MSRP is DOLLAR8. , so using same format in put statement will suffice the purpose
DATA want;
SET SASHELP.CARS(KEEP=MSRP);
MSRP_to_text = put(MSRP, DOLLAR8.);
RUN;
proc contents data=want; run;

SAS &SYSERRORTEXT variable removing quote to use in SQL

I'm trying to use SAS system variables to track batch program execution (SYSERRORTEXT, SYSERR, SYSCC, ...)
I want to insert those in a dataset, using PROC SQL, like this :
Table def :
PROC SQL noprint;
CREATE TABLE WORK.ERROR_&todaydt. (
TRT_NM VARCHAR(50)
,DEB_EXE INTEGER FORMAT=DATETIME.
,FIN_EXE INTEGER FORMAT=DATETIME.
,COD_ERR VARCHAR(10)
,LIB_ERR VARCHAR(255)
,MSG_ERR VARCHAR(1000)
)
;
RUN;
QUIT;
Begin execution :
PROC SQL noprint;
INSERT INTO WORK.ERROR_&todaydt. VALUES("&PGM_NAME", %sysfunc(datetime), ., '', '', '';
RUN;
QUIT;
End execution :
PROC SQL noprint;
UPDATE WORK.ERROR_&todaydt.
SET
FIN_EXE = %sysfunc(datetime())
,COD_ERR = "&syserr"
,LIB_ERR = ''
,MSG_ERR = "&syserrortext"
WHERE TRT_NM = "&PGM_NAME"
;
RUN;
QUIT;
The problem occurs with system variable &syserrortext. which may contains special char, espcially single quote ('), like this
Code example for problem :
DATA NULL;
set doesnotexist;
RUN;
and so, &syserrortext give us : ERROR: Le fichier WORK.DOESNOTEXIST.DATA n'existe pas.
my update command is failing with this test, so how i can remove special chars from my variable &syserrortext ?
One approach, which avoids the need to remove special characters, is to simply use symget(), eg as follows:
,MSG_ERR = symget('syserrortext')
Make sure to quote the value. First use macro quoting in case the value contains characters that might cause trouble. Then add quotes so that the value becomes a string literal. Use the quote() function to add the quotes in case the value contains quote characters already. Use the optional second parameter so that it uses single quotes in case the value contains & or % characters.
,MSG_ERR = %sysfunc(quote(%superq(syserrortext),%str(%')))

how to set missing values to NULL in SAS

I am trying to set missing values to NULL in SAS dataset for a numerical variable,
how can I do this?
as missing is null in sas?
If you're asking how to have the period not display for a missing value, you can use:
options missing=' ';
That however doesn't actually change them to null, but rather to space. SAS must have some character to display for missing, it won't allow no character. You could also pick another character, like:
options missing=%sysfunc(byte(255));
or even
options missing="%sysfunc(byte(0))";
I don't recommend the latter, because it causes some problems when SAS tries to display it.
You can then trim out the space (using trimn() which allows zero length strings) if you are concatenating it somewhere.
Taking the question very literally, and assuming that you want to display the string NULL for any missing values - one approach is to define a custom format and use that:
proc format;
value nnull
.a-.z = 'NULL'
. = 'NULL'
._ = 'NULL'
;
run;
data _null_;
do i = .a,., ._, 1,1.11;
put i nnull.;
end;
run;
You can set values to missing within a data step, when it is numeric :
age=.;
to check for missing numeric values use :
if numvar=. then do;
or use MISSING function :
if missing(var) then do;
IS NULL and IS MISSING are used in the WHERE clause.
Look at : http://www.sascommunity.org/wiki/Tips:Use_IS_MISSING_and_IS_NULL_with_Numeric_or_Character_Variables

Convert multiple variables from char to numeric

I have been working on converting a number of variables in my table to numerical types from characters. I discovered the method to alter one variable and can continue doing so for each variable. However, I wanted to solicit SE because I am having trouble developing a sustainable solution.
How can I edit multiple variables at once in SAS Studio 3.5?
My attempt thus far:
What works:
data work.want(rename=(age_group='Age Group'n));
set work.import;
age_group=input('Age Group'n,8.);
drop 'Age Group'n;
run;
What doesn't work:
data work.want(rename=(age_group='Age Group'n), rename=(dwelling_type='Dwelling Type'n));
set work.import;
age_group=input('Age Group'n,8.);
dwelling_type=input('Dwelling Type'n,8.);
drop 'Age Group'n, 'Dwelling Type'n;
run;
For starters your RENAME statement is incorrect. I don't recommend using that type of variable notation though, so I'm going to suggest labels instead. To convert multiple variables use an array. You do have to list them out once at least though, in the array statement.
data work.want;
set work.import;
array num_vars(*) age_group dwelling_type;
array char_vars(*) 'Age Group'n 'Dwelling Type'n;
do i=1 to dim(num_vars);
num_vars(i) = input(char_vars(i), 8.);
end;
label age_group = 'Age Group'
dwelling_type = 'Dwelling Type';
run;
If you wanted to do a RENAME as a dataset option, you would do it as follows, no comma's and the keyword rename once.
(rename=(age_group='Age Group'n dwelling_type='Dwelling Type'n));

How do I replace a variable value with a another value, character and numeric?

I am trying to replace specific variable values with either a character, or numeric, value on a case by case basis.
My code for changing to value of "no" to "NULL" is as follows:
DATA tp_01_pa_remove_no;
SET tp_01_pa_renamed;
IF variable_name="no" THEN "NULL";
RUN;
I also want to replace additional values:
DATA tp_01_pa_remove_nulls;
SET tp_01_pa_renamed;
IF PAFB_OTHERACTIV_4A1="no" OR "none" OR "None" OR "N/A" THEN PAFB_OTHERACTIV_4A1="NULL";
RUN;
To rename a variable whose value is exactly no, you would do the following:
data tp_01_pa_remove_no;
set tp_01_pa_renamed;
if(variable_Name = "no") then variable_name = "NULL";
run;
This is assuming that variable_name has at least a length of 4.
As an alternative, if you're used to Excel, there are IFN and IFC, which are excel-style statements. First argument is the 'if' condition, second is returned 'if true', third is returned 'if false', and optional fourth is 'if missing/null' (which normally doesn't happen).
data want;
set have;
variable_name = ifc(variable_name='no','NULL',variable_name);
run;
IFN of course returns a numeric. (Neither statement cares what type the first argument is, by the way.)