Converting numerics with COMMAw,d - sas

I am following the SAS help page and was trying to achieve the same results as in the examples to the bottom of the page.
My code:
data _null_;
test=23451.23;
result=input(test,comma10.2);
put 'this should be:' result;
run;
with the output in the Log
this should be:23451
while it should be 23,451.23. There are no errors nor helpful notes nor warnings.
When I am not using the input function, it delivers the correct result
data _null_;
test=23451.23;
format test comma10.2;
put 'this should be:' test;
run;
What is happening here? Is it not possible to combine input and COMMAw,d?

Formats are used to convert values into strings. Informats are used to convert strings into values. You use formats with PUT and FORMAT statements or PUT() function. You use informats with INPUT and INFORMAT statements or INPUT() function.
So the INPUT() function needs a character string as the first argument, but you have given it a number. Notice that SAS will put a NOTE in the log saying that it had to convert numbers to characters. SAS will use the BEST12. format to convert so your number, 23451.23, becomes the 12 character string ' 23451.23'. Then when the INPUT() function uses the COMMA10.2 informat it reads only the first 10 characters and you miss the decimal digits. Note that not only should the width be longer but you should NOT have included a value after the decimal point in the informat. If your informat had an even smaller width you would have missed the decimal point and SAS would have implied one (divided the integer value by 100).
If you want your numbers to be displayed in a particular way then just attach the desired format to the variable. You could just add the format specification to the PUT statement.
put test= comma10.2 ;
Or attach the format to the variable using the FORMAT statement.
format test comma10.2;
If you want to convert your number into a character string then use the PUT() function
char_result = put(test,comma10.2);
or PUTN() function.
char_result = putn(test,'comma10.2');

Use Put statement:
data _null_;
test=23451.23;
result=put(test,comma10.2);
put 'this should be:' result;
run;
this should be:23,451.23
Link to explained diff's between functions.
Table from page:
+---------------------------+-------------------+------------+---------------------+-----------------+
| Function Call | Raw Type | Raw Value | Returned Type | Returned Value |
+---------------------------+-------------------+------------+---------------------+-----------------+
| A PUT(name, $10.); | char, char format | ‘Richard’ | char always | ‘Richard ’ |
| B PUT(age, 4.); | num, num format | 30 | char always | ‘ 30’ |
| C PUT(name, $nickname.); | char, char format | ‘Richard’ | char always | ‘Rick’ |
| D INPUT(agechar, 4.); | char always | ‘30’ | num, num informat | 30 |
| E INPUT(agechar, $4.); | char always | ‘30’ | char, char informat | ‘ 30’ |
| F INPUT(cost,comma7.); | char always | ‘100,541’ | num, num informat | 100541 |
+---------------------------+-------------------+------------+---------------------+-----------------+

Related

How to divide all the observations based on a sum of a column

I'm trying to do simple calculations but I'm new and SAS is not intuitive to me.
Suppose I have this table.
data money;
infile datalines delimiter=",";
input name $ return $ invested;
datalines;
Joe,10,100
Bob,7,50
Mary,80,1000
;
Which creates this
/* name | return | invested */
/* _________________________ */
/* Joe | 10 | 100 */
/* Bob | 7 | 50 */
/* Mary | 80 | 50 */
I have three things I would like to do for my job that just switched over to SAS.
I need to make sure columns return and invested are numeric. When I run the code above, return column ends up being a CHAR column and I don't know why.
Now I want to create a new column and calculate the share of the total return they each got. In this case, the sum of return=97. This is the result I want.
/* name | return | invested | share_of_return */
/* ____________________________________________ */
/* Joe | 10 | 100 | 10.30% */
/* Bob | 7 | 50 | 7.22% */
/* Mary | 80 | 50 | 82.47% */
Next I want to find their ROI. Which is (return-investment) / investment * 100. This is the result I am looking for
/* Find ROI */
/* name | return | invested | share_of_return | ROI */
/* ___________________________________________________ */
/* Joe | 10 | 100 | 10.30% | -90% */
/* Bob | 7 | 50 | 7.22% | -86% */
/* Mary | 80 | 50 | 82.47% | 60% */
I appreciate your explanations and guidance in advanced. This is for a work project and we just switched over to SAS
1 & 3 are easy, 2 is slightly more difficult.
Remove $ in INPUT statement. $ indicates character. In your data you may need to convert it using the input function instead though.
Fix for example:
input name $ return invested;
Fix for actual data using input function. Note that you cannot convert types in a data step to the same name so I rename it while reading it in using the rename data set option.
data money2;
set money (rename = return = return_char);
return = input(return_char, best.);
drop return_char;
run;
Add total value to data step, SQL is fastest here:
proc sql;
create table money3 as
select *, sum(return) as return_total, return/calculated return_total as return_percentage f=percent12.1
from money2;
quit;
I outline two different methods of doing this here
Within a data step, add your calculation. It's probably most efficient if it can be done in first step.
Since a data step loops automatically you write the formula pretty much as shown. In this case I've also applied a format so it shows as a percentage but that requires you to not multiply it by 100. Depending on what you're doing next it may be best to leave it as numeric.
data money2;
set money (rename = return = return_char);
return = input(return_char, best.);
ROI = (return - investment)/investment;
format ROI percent12.1;
run;
drop return_char;
run;

How can I add observations to the existing dataset based on dates?

I have a dataset like this:
data have;
input date :date9. index;
format date date9.;
datalines;
31MAR2019 10
30APR2019 12
31MAY2019 15
30JUN2019 14
;
run;
I would like to add observations with dates from the maximum date (hence from 30JUN2019) until 31DEC2019 (by months) with the value of index being the last available value: 14. How can I achieve this in SAS? I want the code to be flexible, thus for every such dataset, take the maximum of date and add monthly observations from that maximum until DEC2019 with the value of index being equal to the last available value (here in the example the value in JUN2019).
An explicit DO loop over the SET provides the foundation for a concise solution with no extraneous worker variables. Automatic variable last is automatically dropped.
data have;
input date :date9. index;
format date date9.;
datalines;
31MAR2019 10
30APR2019 12
31MAY2019 15
30JUN2019 14
;
data want;
do until (last);
set have end=last;
output;
end;
do last = month(date) to 11; %* repurpose automatic variable last as a loop index;
date = intnx ('month',date,1,'e');
output;
end;
run;
Always helpful to refresh understanding. From SET Options documentation
END=variable
creates and names a temporary variable that contains an end-of-file indicator. The variable, which is initialized to zero, is set to 1 when SET reads the last observation of the last data set listed. This variable is not added to any new data set.
You can do it using end in set statement and retain statement.
data want(drop=i tIndex tDate);
set have end=eof;
retain tIndex tDate;
if eof then do;
tIndex=Index;
tDate=Date;
end;
output;
if eof then do;
do i=1 to 12-month(tDate);
index=tIndex;
date = intnx('month',tDate,i,'e');
output;
end;
end;
run;
INPUT:
+-----------+-------+
| date | index |
+-----------+-------+
| 31MAR2019 | 10 |
| 30APR2019 | 12 |
| 31MAY2019 | 15 |
| 30JUN2019 | 14 |
+-----------+-------+
OUTPUT:
+-----------+-------+
| date | index |
+-----------+-------+
| 31MAR2019 | 10 |
| 30APR2019 | 12 |
| 31MAY2019 | 15 |
| 30JUN2019 | 14 |
| 31JUL2019 | 14 |
| 31AUG2019 | 14 |
| 30SEP2019 | 14 |
| 31OCT2019 | 14 |
| 30NOV2019 | 14 |
| 31DEC2019 | 14 |
+-----------+-------+

How to split a combination of numeric and characters into multiple columns

I want to split some variable "15to16" into two columns where for that row I want the values 15 and 16 in each of the column entries. Hence, I want to get from this
+-------------+
| change |
+-------------+
| 15to16 |
| 9to8 |
| 6to5 |
| 10to16 |
+-------------+
this
+-------------+-----------+-----------+
| change | from | to |
+-------------+-----------+-----------+
| 15to16 | 15 | 16 |
| 9to8 | 9 | 8 |
| 6to5 | 6 | 5 |
| 10to16 | 10 | 16 |
+-------------+-----------+-----------+
Could someone help me out? Thanks in advance!
data have;
input change $;
cards;
15to16
9to8
6to5
10to16
;
run;
data want;
set have;
from = input(scan(change,1,'to'), 8.);
to = input(scan(change,2,'to'), 8.);
run;
N.B. in this case the scan function is using both t and o as separate delimiters, rather than looking for the word to. This approach still works because scan by default treats multiple consecutive delimiters as a single delimiter.
Regular expressions with the metacharacter () define groups whose contents can be retrieved from capture buffers with PRXPOSN. The capture buffers retrieved in this case would be one or more consecutive decimals (\d+) and converted to a numeric value with INPUT
data have;
input change $20.; datalines;
15to16
9to8
6to5
10to16
run;
data want;
set have;
rx = prxparse('/^\s*(\d+)\s*to\s*(\d+)\s*$/');
if prxmatch (rx, change) then do;
from = input(prxposn(rx,1,change), 12.);
to = input(prxposn(rx,2,change), 12.);
end;
drop rx;
run;
You can get the answer you want by declaring delimiter when you create the dataset. However you did not provide enough information regarding your other variables and how you import them
Data want;
INFILE datalines DELIMITER='to';
INPUT from to;
datalines;
15to16
9to8
6to5
10to16
;
Run;

Compare Value of Current Observation with First Observation

I have a set of multiple choice responses from a survey with 45 questions, and I've placed the correct responses as my first observation in the dataset.
In my DATA step I would like to set values to 0 or 1depending on whether the variable in each observation matches the same variable in the first observation, I want to replace the response letter (A-D) with the 0 or 1 in the dataset, how do I go about doing that comparison?
I'm not doing any grouping, so I believe I can access the first row using First.x, but I'm not sure how to compare that across each variable(answer1-answer45).
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | A | C |
| 3 | C | D |
| 4 | A | B |
| 5 | D | C |
| 6 | B | B |
Should become:
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | 1 | 0 |
| 3 | 0 | 0 |
| 4 | 1 | 1 |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
Current code for reading in the data:
DATA TEST(drop=name fill answer0);
INFILE SCORES DSD firstobs=2;
length id $4;
length answer1-answer150 $1;
INPUT name $ fill id $ (answer0-answer150) ($);
RUN;
Thanks in advance!
Here's how I might do it. Create a data set to PROC COMPARE the KEY to the observed. Then you have X for not matching key and missing for matched. You can then use PROC TRANSREG to score the 'X.' to 01. PROC TRANSREG also creates macro variables which contain the names of the new variables and the number.
From log NOTE: _TRGINDN=2 _TRGIND=answer1D answer2D
data questions;
input id:$3. (answer1-answer2)(:$1.);
cards;
KEY A B
2 A C
3 C D
4 A B
5 D C
6 B B
;;;;
run;
data key;
if _n_ eq 1 then set questions(obs=1);
set questions(keep=id firstobs=2);
run;
proc compare base=key compare=questions(firstobs=2) out=comp outdiff noprint;
id id;
run;
options validvarname=v7;
proc transreg design data=comp(drop=_type_ type=data);
id id;
model class(answer:) / noint;
output out=scored(drop=intercept _:);
run;
%put NOTE: &=_TRGINDN &=_TRGIND;
I don't have my SAS license here at home, so I can't actually test this code. I'll give it me best shot, though ...
First, I'd keep my correct answers in a separate table, and then merge it with the answers from the respondents. That also makes the solution scalable, should you have more multiple choice solutions and answers in the same table, since you'd be joining on the assignment ID as well.
Now, import all your correct answers to a table answers_correct with column names answer_correct1-answer_correct45.
Then, merge the two tables and determine the outcome for each question.
DATA outcome;
MERGE answers answers_correct;
* We will not be using any BY.;
* If you later add more questionnaires, merge BY the questionnaire ID;
ARRAY answer(*) answer1-answer45;
ARRAY answer_correct(*) answer_correct1-answer_correct45;
LENGTH result1-result45 $1;
ARRAY result(*) result1-result45;
DROP i;
FOR i = 1 TO DIM(answer);
IF answer(i) = answer_correct(i) THEN result(i) = '1';
ELSE result(i) = '0';
END;
RUN;

Stata: Comparing string variables

I have two string variables that differ on one character for each observation. I need to get the position of that different character.
I have tried to use indexnot() function but it yields false results as the characters in both strings are the same.
Here is an illustrative example, and variable position is the one I am trying to get to:
+--------------+--------------+-----------+
| String 1 | String 2 | Position |
+--------------+--------------+-----------+
| 000002002000 | 000000002000 | 6 |
| 000002102000 | 000002002000 | 7 |
| 000002112000 | 000002102000 | 8 |
| 000002112020 | 000002112000 | 11 |
| 000002112120 | 000002112020 | 10 |
+--------------+--------------+-----------+
gen Position = .
quietly forval j = 1/12 {
replace Position = `j' if substr(String1, `j', 1) != substr(String2, `j', 1) & missing(Position)
}
Commentary is perhaps redundant here, but will harm no-one.
In the absence of a built-in function to do this, you need to write some code using existing commands and functions. Initialise a Position to missing (zero would do fine as an alternative). Then loop over the characters, here 1 to 12 because the example shows 12 character strings. We record the position of the first difference in characters. Note how the condition missing(Position) (Position == . if you like) restricts changes to the first difference met.
Stata loops automatically over all the observations here, so the only loop needed is over string positions.