I have a data that has column A with following data
Column A
--------
1
2
?
2
I used the query:
proc sql;
select
if A= '?' then A=., count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
This is not working. Please suggest how can i replace the ? to any standard value?
In SQL it's a CASE statement, not IF/THEN.
proc sql;
select
case when a='?' then .
else a end as a, count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
Or you could use an IFC() function as well.
proc sql;
select
ifc(a='?', ., a) as a, count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
Column A contains "?" so it is character valued. The #reeza code should be then "" or ifc(a='?',"", a). Also, if you do not also select the grouping variable the context of the N_obs is lost.
Suggest
data have;
input a $ nsp ;
datalines;
1 2
2 3
? 7
2 7
run;
proc sql;
select
nsp
, case when a='?' then '' else a end as a
, count(*) as nsp_count
from have
group by nsp
;
quit;
The query will also log the message NOTE: The query requires remerging summary statistics back with the original data. as Proc SQL is performing an automatic remerge of group aggregates with individual rows within the group.
Related
Below given dataset I am trying to find New Users Vs Repeated Users.
DATE ID Unique_Event
20200901 a12345 1
20200902 a12345 1
20200903 b12345 1
20200903 a12345 1
20200904 c12345 1
In the above dataset, since a12345 appeared on multiple dates, should be counted as a "repeated" user whereas b12345 only appeared once, so he is a "new" user. Please note, this is only sample data as the actual data is quite large. I tried the below code, but I am not getting the correct count. Ideally, tot_num_users-num_new_users should be repeated users, but I am getting incorrect counts. Am I missing something?
Expected Output:
Month new_users repeated_users
9 2 1
Code:
data user_events;
set user_events;
new_date=input(date,yymmdd10.);
run;
proc sql;select month(new_date) as mm,
count(distinct vv.id) as total_num_users,
count(distinct case when v.new_date = vv.minva then v.id end) as num_new_users,
(count(distinct vv.id) - count(distinct case when v.new_date = vv.minva then id end)
) as num_repeated_users
from user_events v inner join
(select t.id, min(new_date) as minva
from user_events t
group by t.id
) vv
on v.id = vv.id
group by 1
order by 1;quit;
In a sub-select, for each ID you can count the number of distinct DATE to determine the new / repeated status. The all ids aggregate computations are made from the sub-select.
proc sql;
create table freq as
select
count(*) as id_count
, sum (status='repeated') as id_repeated_count /* sum counts a logic eval state */
, sum (status='new') as id_new_count
from
( select
id
, case
when count(distinct date) > 1 then 'repeated'
else 'new'
end as status
from
user_events
group by
id
) as statuses
;
An alternative solution not using proc sql (though I'm aware you tagged this with "proc sql").
data final;
set user_events;
Month=month(new_date);
run;
proc sort data=final; by Month ID;
data final;
set final;
by Month ID;
if first.Month then do;
new_users=0;
repeated_users=0;
end;
if last.ID then do;
if first.ID then
new_users+1;
else
repeated_users+1;
end;
if last.Month then
output;
keep Month new_users repeated_users;
run;
Since you are using proc sql, this is a sql question, not a SAS question.
Try something like:
proc sql;
select ID,count(Unique_Event)
from <that table>
group by ID
order by ID
run;
After I run this SAS SQL process. The date variable "EXVISDAT" showed in the results in a numerical way. How can I change the code to let the date variable "EXVISDAT" show in format of "date9."?
proc sql;
create table test2 as
select VISIT, Targetdays, Targetdays + '01JUL2019'd - 1 as EXVISDAT
from data_in.chrono_visits_lts14424
order by Targetdays
;
quit;
You can read about it in documentation.
In yours case, query will look like:
proc sql;
create table test2 as
select VISIT, Targetdays, Targetdays + '01JUL2019'd - 1 as EXVISDAT format=date9.
from data_in.chrono_visits_lts14424
order by Targetdays
;
quit;
I have two codes one proc sql and another proc and datastep. Both are interlinked datasets.
Below is the proc sql lines.
create table new as select a.id,a.alid,b.pdate from tb a inner join
tb1 act on a.aid =act.aid left join tb2 as b on (r.alid=a.alid) where
a.did in (15,45); quit;
Below is the proc and datasteps created from above datatset new.
proc sort data = new uodupkey;
by alid;
data new1;
set new;
format ddate date9.
dat1=datepart(today);
datno=input(number,20.);
key=_n_;
rename alid blid;
run;
proc sort data=new1 nodupkey;
by datno dat1;
run;
I need to put everything into single proc sql step.
You mention two data steps but I only see one.
Anyway, your data step and proc sort can indeed be written in one sql query (which you can then insert in your proc sql):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from new1
group by datno, dat1
having key=min(key)
;
quit;
One remark though. Your data step expects variables called ddate,today and number in your input dataset new. If that dataset is supposed to be the result of your first sql query, then those variables don't exist and their values along with those of dat1 and datno in new1 will always be missing.
Also I assume you misspelled nodupkey on your proc sort.
EDIT: or, to have it all in the same query (if that's what you meant with "the same proc sql"):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from (
select a.id,a.alid,b.pdate
from tb a
inner join tb1 act
on a.aid =act.aid
left join tb2 as b
on (r.alid=a.alid)
where a.did in (15,45)
)
group by datno, dat1
having key=min(key)
;
quit;
I know in teradata or other sql platforms you can find the count distinct of a combination of variables by doing:
select count(distinct x1||x2)
from db.table
And this will give all the unique combinations of x1,x2 pairs.
This syntax, however, does not work in proc sql.
Is there anyway to perform such a count in proc sql?
Thanks.
That syntax works perfectly fine in PROC SQL.
proc sql;
select count(distinct name||sex)
from sashelp.class;
quit;
If the fields are numeric, you must put them to character (using put) or use cat or one of its siblings, which happily take either numeric or character.
proc sql;
select count(distinct cats(age,sex))
from sashelp.class;
quit;
This maybe redundant, but when you mentioned "combination", it instantly triggered 'permutation' in my mind. So here is one solution to differentiate these two:
DATA TEST;
INPUT (X1 X2) (:$8.);
CARDS;
A B
B A
C D
C D
;
PROC SQL;
SELECT COUNT(*) AS TOTAL, COUNT(DISTINCT CATS(X1,X2)) AS PERMUTATION,
COUNT(DISTINCT CATS(IFC(X1<=X2,X1,X2),IFC(X1>X2,X1,X2))) AS COMBINATION
FROM TEST;
QUIT;
Trying to make a more simple unique identifier from already existing identifier. Starting with just and ID column I want to make a new, more simple, id column so the final data looks like what follows. There are 1million + id's, so it isnt an option to do if thens, maybe a do statement?
ID NEWid
1234 1
3456 2
1234 1
6789 3
1234 1
A trivial data step solution not using monotonic().
proc sort data=have;
by id;
run;
data want;
set have;
by id;
if first.id then newid+1;
run;
using proc sql..
(you can probably do this without the intermediate datasets using subqueries, but sometimes monotonic doesn't act the way you'd think in a subquery)
proc sql noprint;
create table uniq_id as
select distinct id
from original
order by id
;
create table uniq_id2 as
select id, monotonic() as newid
from uniq_id
;
create table final as
select a.id, b.newid
from original_set a, uniq_id2 b
where a.id = b.id
;
quit;