i want to count variables by class and num :
id class num
3 FE 351
3 FE 351
3 FE 352
How can i create the new variable like the table below
id class num count
3 FE 351 2
3 FE 351 2
3 FE 352 1
Assuming the ID is not part of the group based on the question. If this is not the case, then add it appropriately.
First create a tables of counts.
proc sql noprint;
create table counts as
select class,
num,
count(*) as count
from have
group by class, num;
quit;
Then rejoin to the main table
proc sql noprint;
create table want as
select a.id,
a.class,
a.num,
b.count
from have as a
left join
counts as b
on a.class = b.class
and
a.num = b.num;
quit;
You can do it in one simple SQL statement:
proc sql noprint;
create table want as
select id, class, num, count(num) as count
from have
group by num;
quit;
Related
I am looking to join two tables together
Table 1 - The baseball dataset
DATA baseball;
SET sashelp.baseball
(KEEP = crhits);
RUN;
Table 2 - A table containing the percentiles of CRhits
PROC STDIZE
DATA = baseball
OUT=_NULL_
PCTLMTD=ORD_STAT
PCTLDEF=5
OUTSTAT=STDLONGPCTLS
(WHERE = (SUBSTR(_TYPE_,1,1) = "P"))
pctlpts = 1 TO 99 BY 1;
RUN;
I would like to join these tables together to create a table that contains the values for crhits and then a column identifying which percentile that value belongs to like below
crhits percentile percentile_value
54 p3 54
66 p5 66
825 p63 825
1134 p76 1133
The last column indicates the percentile value given by stdlongpctls
I currently use the following code to calculate the percentiles and a loop to count the number of "Events" per percentile, per factor
I have tried a cross-join but I am having trouble visualising how to join these two tables without an explicit key
PROC SQL;
CREATE TABLE cross_join_table AS
SELECT
a.crhits
, b._TYPE_
, CASE WHEN
a.crhits < b.type THEN b._TYPE_ END AS percentile
FROM
baseball a
CROSS JOIN
stdlongpctls b;
QUIT;
If there is another easier / more efficient way to find the number of observations and number of dependent variables (e.g. I am modelling on a default flag event in my actual dataset, so the sum of 1's per percentile group, I would appreciate it)
Use PROC RANK instead to group it into the percentiles.
proc rank data=sashelp.baseball out=baseball_ranks group=100;
var crhits;
rank rank_crhits;
run;
You can then summarize it using PROC MEANS.
I am looking for a way to produce a frequency table displaying number of unique values for variable ID per uniqe value of variable Subclass.
I would like to order the results by variable Class.
Preferably I would like to display the number of unique values for ID per Subclass as a fraction of n for ID. In the want-example below this values is displayed under %totalID.
In addition I would like to display the number of unique values for ID per Subclass as a fraction of the sum of unique ID values found within each Class. In the want-example below this values is displayed under %withinclassID.
Have:
ID Class Subclass
-------------------------------
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
Want:
Unique number
Class Subclass of IDs %totalID %withinclassID
--------------------------------------------------------------------
1
1a 3 100.0 50.00
1b 2 66.67 33.33
1c 1 33.33 16.67
SUM 6
2
2a 3 100.0 75.00
2b 1 33.33 25.00
SUM 4
3
3a 2 66.67 66.67
3b 1 33.33 33.33
SUM 3
My initial approach was to perform a PROC FREQ on NLEVELS producing a frequency table for number of unique IDs per subclass. Here however I lose information on class. I therefore cannot order the results by class.
My second approach involved using PROC TABULATE. I however cannot produce any percentage calculations based on unique counts in such a table.
Is there a direct way to tabulate the frequencies of one variable according to a second variable, grouped by a third variable--displaying overall and within group percentages?
You can do a double proc freq or SQL.
/*This demonstrates how to count the number of unique occurences of a variable
across groups. It uses the SASHELP.CARS dataset which is available with any SAS installation.
The objective is to determine the number of unique car makers by origin/
Note: The SQL solution can be off if you have a large data set and these are not the only two ways to calculate distinct counts.
If you're dealing with a large data set other methods may be appropriate.*/
*Count distinct IDs;
proc sql;
create table distinct_sql as
select origin, count(distinct make) as n_make
from sashelp.cars
group by origin;
quit;
*Double PROC FREQ;
proc freq data=sashelp.cars noprint;
table origin * make / out=origin_make;
run;
proc freq data=origin_make noprint;
table origin / out= distinct_freq outpct;
run;
title 'PROC FREQ';
proc print data=distinct_freq;
run;
title 'PROC SQL';
proc print data=distinct_sql;
run;
The nlevels option in proc freq can produce the unique count you're after without losing data, providing you include Class and Subclass variables in the by statement. That also means you'll have to presort the data by the same variables.
Then you could try proc tabulate to get the rest of your requirement.
data have;
input ID $ Class Subclass $;
datalines;
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
;
run;
proc sort data=have;
by class subclass;
run;
ods output nlevels = unique_id_count;
proc freq data=have nlevels;
by class subclass;
run;
I would like to create a new column and assign a value based on combination of 3 variables. for example, If Battery Life is 4, RAM (GB) is 3 and HD Size (GB) is 40 then assign 80 in new variable 'Laptop_Class'. There are other combinations as well. How do I do that using proc SQL in SAS?
You can do this in PROC SQL, but a Data Step will be less code and more intuitive.
data have;
set have;
if battery_life = 4 and ram = 3 and hd_size = 40 then
new_var=80;
else if .... then
...
run;
SQL alternative to data step if/then is CASE WHEN
proc sql;
create table test as select
t1.*,
case
when t1.battery_life = 4 and t1.ram = 3 and t1.hd_size = 40 then 80
when ......... then 70
when ......... then 60
end as new_column
from source_table t1;
quit;
Here is the the structure of my data below:
I only need ID 2 since all patients are alive, therefore trying to delete ID 1:
ID sex status
1 2 A
1 2 A
1 2 A
1 2 D
2 1 A
2 1 A
2 1 A
If you truly want to delete the records from your source dataset, you can do so with:
PROC SQL;
DELETE FROM MyData WHERE ID = 1;
QUIT;
However, if you want to retain the source dataset as-is; maybe you will use it again, it would be best to create a new dataset from it, like so:
PROC SQL;
CREATE TABLE MyFilteredData AS
SELECT ID, sex, status
FROM MyData
WHERE ID = 2;
QUIT;
or
DATA MyFilteredData;
SET MyData;
IF ID = 2;
RUN;
proc sql;
delete from your_data where id ~= 2;
quit;
This PROC SQL will create a new dataset Want from the original dataset Have including only IDs that have no status="D":
proc sql;
create table Want as
select *
from Have
where ID not in
(select distinct ID
from Have
where status="D")
;
quit;
Trying to make a more simple unique identifier from already existing identifier. Starting with just and ID column I want to make a new, more simple, id column so the final data looks like what follows. There are 1million + id's, so it isnt an option to do if thens, maybe a do statement?
ID NEWid
1234 1
3456 2
1234 1
6789 3
1234 1
A trivial data step solution not using monotonic().
proc sort data=have;
by id;
run;
data want;
set have;
by id;
if first.id then newid+1;
run;
using proc sql..
(you can probably do this without the intermediate datasets using subqueries, but sometimes monotonic doesn't act the way you'd think in a subquery)
proc sql noprint;
create table uniq_id as
select distinct id
from original
order by id
;
create table uniq_id2 as
select id, monotonic() as newid
from uniq_id
;
create table final as
select a.id, b.newid
from original_set a, uniq_id2 b
where a.id = b.id
;
quit;