I have a data set that looks like this:
Stop Group
JFK A
JFK B
JFK C
AMS A
AMS B
AMS C
LHR A
SFO B
I'm trying to generate a new data set where each Stop will have values A, B and C. For example, JFK and AMS already have A-C, so no change is needed. LHR needs B and C added and SFO needs A and C added. The output dataset should look like this:
JFK A
JFK B
JKF C
AMS A
AMS B
AMS C
LHR A
LHR B
LHR C
SFO A
SFO B
SFO C
Any ideas? Thanks!
This is a simple quick solution:
PROC SQL noprint;
select distinct quote(stop) into :stop separated by ', '
from have;
select distinct quote(group) into :group separated by ', '
from have;
quit;
data want;
length stop $4 Group $2;
do stop=&stop.;
do Group=&group.;
output;
end;
end;
run;
Related
Could you give some advise please how to calculate different counts to different columns when we group a certain variable with proc report (if it is possible with it)?
I copy here an example and the solution to better understand what i want to achieve. I can compile this table in sql in a way that i group them individually (with where statements, for example where Building_code = 'A') and then i join them to one table, but it is a little bit long, especially when I want to add more columns. Is there a way to define it in proc report or some shorter data step query, if yes can you give a short example please?
Example:
Solution:
Thank you for your time.
This should work. There is absolutely no need to do this by joining multiple tables.
data have;
input Person_id Country :$15. Building_code $ Flat_code $ age_category $;
datalines;
1000 England A G 0-14
1001 England A G 15-64
1002 England A H 15-64
1003 England B H 15-64
1004 England B J 15-64
1005 Norway A G 15-64
1006 Norway A H 65+
1007 Slovakia A G 65+
1008 Slovakia B H 65+
;
run;
This is a solution in proc sql. It's not really long or complicated. I don't think you could do it any shorter using data step.
proc sql;
create table want as
select distinct country, sum(Building_code = 'A') as A_buildings, sum(Flat_code= 'G') as G_flats, sum(age_category='15-64') as adults
from have
group by country
;
quit;
I have a dataset (BASE)with the following strucuture: a column with a index for every records, a column with a classification type, the classification value and a column i'd like to populate.
NAME |CLASSIFICATION|VALUE|STANDARD VALUE
FIDO |ALFABET |F |
ALFA |STANDARD |2 |
BETA |STANDARD |5 |
ETA |MIXED |B65 |
THETA|MIXED |A40 |
Not all records have the same classification, however I have an additional table (TRANSCODE) to convert the different classification methods into the standard one (which is classification):
ALFABET|STANDARD|MIXED
A |1 |A1
B |5 |A30
C |3 |A40
D |5 |A31
E |8 |B65
F |6 |C54
My goal is to populate the fourth column with the corresponding value i can find with the second table. (the records with the standard classification will have two columns with the same classification).
After that my data should be like the following:
NAME |CLASSIFICATION|VALUE|STANDARD VALUE
FIDO |ALFABET |F |6
ALFA |STANDARD |2 |2
BETA |STANDARD |5 |5
ETA |MIXED |B65 |8
THETA|MIXED |A40 |3
In order to do so i'm trying to do a proc sql update with a join condition but it doesn't seem to work
proc sql;
update BASE
left join TRASCODE
on BASE.VALUE= (
case
when BASE.CLASSIFCATION = 'ALFABET' then TRANSCODE.ALFABET
when BASE.CLASSIFICATION= 'STANDARD' then TRANSCODE.STANDARD
when BASE.CLASSIFICATION= 'MIXED then TRANSCODE.MIXED
end
)
set BASE.STANDARD_VALUE = TRANSCODE.STANDARD
;
quit;
Can someone help me?
Thanks a lot
The value selection for the standard value is a lookup query, so you can not join directly to transcode.
Try this UPDATE query that uses a different lookup selection for each classification:
data base;
infile cards missover;
input
NAME $ CLASSIFICATION $ VALUE $ STANDARD_VALUE $; datalines;
FIDO ALFABET F
ALFA STANDARD 2
BETA STANDARD 5
ETA MIXED B65
THETA MIXED A40
run;
data transcode;
input
ALFABET $ STANDARD $ MIXED $; datalines;
A 1 A1
B 5 A30
C 3 A40
D 5 A31
E 8 B65
F 6 C54
run;
proc sql;
update base
set standard_value =
case
when classification = 'ALFABET' then (select standard from transcode where alfabet=value)
when classification = 'MIXED' then (select standard from transcode where mixed=value)
when classification = 'STANDARD' then value
else 'NOTRANSCODE'
end;
%let syslast = base;
I am using below mentioned code to get the columns sorted dynamically after proc transpose. I have gone a lot of solutions for this solution. But now I am getting an error if I run
data work.AB ;
input name $ class $ dt $ gpa $;
datalines;
JOHN 1 201607 C-
JOHN 1 201608 C+
JOHN 1 201702 B-
JOHN 2 201608 A
NICK 1 201608 A
NICK 1 201707 A
MIKE 2 201608 B
MIKE 2 201607 B
MIKE 2 201707 B+
MIKE 2 201702 B
BOB 3 201702 D
BOB 3 201607 C
BOB 3 201707 C
;
proc sort data=work.AB;
by NAME ClASS dt;
run;
PROC TRANSPOSE DATA = AB OUT = ABC(drop=_name_) ;
BY nAME cLASS;
VAR GPA;
ID dt;
RUN ;
proc sql ;
create table test as
select name into : list separated by ' '
from dictionary.columns
where libname='WORK' and memname='ABC'
order by input(substr(name,anydigit(name)),best32.)
;
quit;
%put &list;
data want;
retain &list;
set ABC;
run;
Error that I get is
22 GOPTIONS ACCESSIBLE;
WARNING: Apparent symbolic reference LIST not resolved.
23 %put &list;
&list
24 data want;
25 retain &list;
_
22
200
WARNING: Apparent symbolic reference LIST not resolved.
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.
ERROR 200-322: The symbol is not recognized and will be ignored.
26 set ABC;
27 run;
Kindly suggest.
You cannot put the values from the same SELECT statement into both a dataset and macro variables. Remove the create table test as from your SQL code.
You also might want to suppress some of the warnings by changing the query to:
proc sql noprint ;
select case when (not anydigit(name)) then -1
else input(substr(name,anydigit(name)),?32.)
end as order
, name
into :list
, :list separated by ' '
from dictionary.columns
where libname='WORK' and memname='ABC'
order by 1
;
quit;
%put &list;
I have some survey data with possible responses, an example would be:
Q1
Person1 Yes
Person2 No
Person3 Missing
Person4 Multiple Marks
Person5 Yes
I need to calculate the frequencies by question, so that only the Yes/No (other questions have varied responses such as frequently, very frequently, etc) are counted in the totals - not the ones with Multiple Marks. Is there a way to exclude these using proc freq or another method?
Outcome:
Yes: 2
No: 1
Total: 3
Using proc freq, I'd do something like this:
proc freq data=have (where=(q1 in ("Yes", "No")));
tables q1 / out=want;
run;
Output:
Q1 Count Percent
No 1 33.333333333
Yes 2 66.666666667
Proc sql:
proc sql;
select
sum(case when q1 eq "Yes" then 1 else 0 end) as Yes
,sum(case when q1 eq "No" then 1 else 0 end) as No
,count(q1) as Total
from have
where q1 in ("Yes", "No");
quit;
Output:
Yes No Total
2 1 3
The best way to do this is using formats.
Rather than storing your data as character strings, you should be storing it as numeric variables. This allows you to use numeric missing values to code those values you don't consider proper responses; using formats allows you to have your cake and eat it to (i.e., allows you to still have those nice pretty response labels).
Here's an example. To understand this, you need to understand SAS special missings. Note the missing statement tells SAS to consider a single "M" in the input as .M (and similar for D and R). I then show two PROC FREQ results, one with the missings excluded, one with them included, to show the difference.
proc format;
value YNQF
1 = 'Yes'
2 = 'No'
. = 'Missing'
.M= 'Multiple Marks'
.D= "Don't Know"
.R= "Refused"
;
quit;
missing M R D;
data have;
input Q1 Q2 Q3;
format q1 q2 q3 YNQF.;
datalines;
1 1 2
2 1 R
. . 1
M 1 1
1 . D
;;;;
run;
proc freq data=have;
tables (q1 q2 q3);
tables (q1 q2 q3)/missing;
run;
I have a dataset (patients) as such:
Pat_ID Hos Date
A 11 1/1/2012
B 12 2/3/2012
B 13 2/3/2012
C 11 4/1/2012
C 11 4/5/2012
How do I count using proc sql such that the outcome looks something like this:
Pat_ID Visits
A 1
B 1
C 2
Since B has two visits on the same date, they are considered as only 1 visit, whereas C has 2 visits because they are on different dates.
select Pat_ID, count(distinct VisitDate) as Visits
from patient
group by Pat_ID
order by Pat_ID asc