I have created the following SAS table:
DATA test;
INPUT name$ Group_Number;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2;
run;
I would like to change group number from a character type into a numeric type.
Here is my attempt:
data test2;
set test;
Group_Number1 = input(Group_Number, best5.);
run;
The problem is that when I execute:
proc contents data = test2;
run;
The output table shows that group number is still of a character type. I think that the problem may be that I have "best5." in my input statement. However I am not 100% sure what is wrong.
How can I fix the solution?
If you have a character variable your code will work. But you don't, you have a numeric variable in your sample data. So either your fake data is incorrect, or you don't have the problem you think you do.
Here's an example that you can run to see this.
*read group_number as numeric;
DATA test_num;
INPUT name$ Group_Number;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2
;
run;
Title 'Group_Number is Numeric!';
proc contents data=test;
run;
*read group_number as character;
DATA test_char;
INPUT name$ Group_Number $;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2
;
run;
data test_converted;
set test_char;
group_number_num = input(group_number, 8.);
run;
Title 'Group_Number is Character, Group_Number1 is Numeric';
proc contents data=test_converted;
run;
try this:
data test2;
set test;
Group_Number1 = input(put(Group_Number,best5.),best5.);
run;
Related
I figured out the solution to my problem already, but I'd like to know what is happening exactly, and why, or maybe if there is a workaround to the following:
Suppose you have:
data test;
length group $20.;
subject=1; hours=0; group= 'hour 1'; output;
subject=1; hours=1; group= 'hour 15'; output;
subject=1; hours=2; group= 'hour 15'; output;
subject=2; hours=0; group= 'hour 1'; output;
subject=2; hours=1; group= 'hour 15'; output;
subject=2; hours=2; group= 'hour 15'; output;
run;
And you are sorting on the hours first, then group because it is character and wouldn't properly sort otherwise.
proc sort data=test;
by subject hours group;
run;
Now when you run this code to retrieve only the first record of each group:
data test2;
set test;
by subject hours group;
if first.group;
run;
It will print each record.
I recently learned that 'When you use more than one variable in the BY statement; If the first/last variable linked to a primary BY-variable changes to 1, the first/last variable linked to the second BY-variable will also be changed to one.'.
So of course, because the hours variable changes, the first/last from group is also reset.
So 'why' is this code running fine?
data test2;
set test;
by subject group;
if first.group;
run;
It seems a bit weird to have to leave out variables you sorted on, and it doesn't appear so flexible, you can't use a macro variable list as an input to sort and by statement in a data step for example...? If this is just the way it is, is there maybe another preferred way of doing these kind of operations? I can see myself making this error often, just copy pasting the list of sorting variables...
If you want to use a BY statement to generate FIRST. and LAST. variables for a grouped variable that is not actually sorted then use the NOTSORTED keyword on the BY statement.
For example you might want to order the data by HOUR and then group it by the STATUS so that you can find out at what hour they transitioned to that STATUS.
data have;
input subject hour status $;
cards;
1 0 C
1 1 B
1 2 B
1 3 D
2 0 A
2 1 D
2 2 D
;
data want ;
set have ;
by subject status notsorted;
if first.status;
run;
Result:
Obs subject hour status
1 1 0 C
2 1 1 B
3 1 3 D
4 2 0 A
5 2 1 D
I would like to represent the title and gender variables as numbers. What code do I need to add to do this?
DATA test;
INPUT title$ gender$ name$ age;
CARDS;
Mr Male Micheal 20
Mrs Female Stephanie 25
Mr Female Linda 30
Dr Male James 40
Dr Female Jane 45;
run;
Below is my attempt at the question. However something is wrong because the title and gender variables does not change!
proc format library = Work;
value $title_ 'Mr' = 1 'Mrs' = 2 'Dr' = 3;
value $gender_ 'Male' = 1 'Female' = 2;
run;
OPTIONS FMTSEARCH = (Work);
data test;
format $title $title_;
set test;
run;
You're nearly there - you just have slightly wrong syntax for your format statement. This is your current format statement:
format $title $title_;
Here's a corrected one. I've extended it to apply your gender format as well:
format title $title_. gender $gender_.;
It is not necessary to overwrite a dataset to apply a format, i.e.
data mydata;
set mydata;
format ...;
run;
You can apply one directly by using proc datasets instead of writing a data step like the one above, e.g.
proc datasets lib = work;
modify test;
format title $title_. gender $gender_.;
run;
quit;
proc sort data = group;
by studystyle;
run;
proc means data= group mean;
var test1 test2;
by studystyle;
output out = groupmeans mean = groupmeans;
run;
so I have this dataset of groups of students containing student ID, test1 scores, test2 scores, and their study styles.
I then created a new dataset of the means of these test scores sorted by the study styles.
I am trying to create 2 new datasets based around the 2 tests, both datasets should include the study style, the mean, and test #.
I figure I can just start by creating a new dataset using the set command to use the previous dataset. However I don't really know how to grab the test means for each study style. instead i just used datalines to manually place the mean values in, however I would prefer to grab those values from the previous dataset itself.
data newgroup1;
set groupmeans;
drop test1 test2 _type_ _freq_ _stat_;
input StudyStyle AVG Testnum;
datalines;
1 51.6875 1
2 49.27273 1
3 49.09091 1
;
run;
data newgroup2;
set groupmeans;
drop test2 test1 _type_ _freq_ _stat_;
input StudyStyle AVG Testnum;
datalines;
1 51.5 2
2 65.2727 2
3 90.5454 2
;
run;
data newgroup;
set newgroup1 newgroup2;
run;
If I understand your problem correctly, all you need to change is to create the means of test1 and test2 separately and then write two datasets. Try the code below.
proc sort data = group;
by studystyle;
run;
proc means data= group mean;
var test1 test2;
by studystyle;
output out = groupmeans mean(test1) = mtest1 mean(test2) = mtest2;
run;
data newgroup1 (keep=studystyle mtest1) newgroup2(keep=studystyle mtest2) ;
set groupmeans;
run;
I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;
proc sort data=sas.mincome;
by F3 F4;
run;
Proc sort doesn't sort the dataset by formatted values, only internal values. I need to sort by two variables prior to a merge. Is there anyway to do this with proc sort?
I don't think you can sort by formatted values in proc sort, but you can definitely use a simple proc SQL procedure to sort a dataset by formatted values. proc SQL is similar to the data step and proc sort, but is more powerful.
The general syntax of proc sql for sorting by formatted values will be:
proc sql;
create table NewDataSet as
select variable(s)
from OriginalDataSet
order by put(variable1, format1.), put(variable2, format2.);
quit;
For example, we have a sample data set containing the names, sex and ages of some people and we want to sort them:
proc format;
value gender 1='Male'
2='Female';
value age 10-15='Young'
16-24='Old';
run;
data work.original;
input name $ sex age;
datalines;
John 1 12
Zack 1 15
Mary 2 18
Peter 1 11
Angela 2 24
Jack 1 16
Lucy 2 17
Sharon 2 12
Isaac 1 22
;
run;
proc sql;
create table work.new as
select name, sex format=gender., age format=age.
from work.original
order by put(sex, gender.), put(age, age.);
quit;
Output of work.new will be:
Obs name sex age
1 Mary Female Old
2 Angela Female Old
3 Lucy Female Old
4 Sharon Female Young
5 Jack Male Old
6 Isaac Male Old
7 John Male Young
8 Zack Male Young
9 Peter Male Young
If we had used proc sort by sex, then Males would have been ranked first because we had used 1 to represent Males and 2 to represent Females which is not what we want. So, we can clearly see that proc sql did in fact sort them according to the formatted values (Females first, Males second).
Hope this helps.
Because of the nature of formats, SAS only uses the underlying values for the sort. To my knowledge, you cannot change that (unless you want to build your own translation table via PROC TRANTAB).
What you can do is create a new column that contains the formatted value. Then you can sort on that column.
proc format library=work;
value $test 'z' = 'a'
'y' = 'b'
'x' = 'c';
run;
data test;
format val $test.;
informat val $1.;
input val $;
val_fmt = put(val,$test.);
datalines;
x
y
z
;
run;
proc print data=test(drop=val_fmt);
run;
proc sort data=test;
by val_fmt;
run;
proc print data=test(drop=val_fmt);
run;
Produces
Obs val
1 c
2 b
3 a
Obs val
1 a
2 b
3 c