I am trying to create a scoring table by looking up a grading system table. There are three teachers grade all the students, and they have their own way of grading. I am trying standardize students' marks by mapping to the look up table. My tables look like this:
old grades table:
prof_grade TA_grade chair_grade
Anne A+ A AAA
Peter B+ B+ AA
Look up table1:
Score Rating Teacher
10 A+ prof
10 A TA
10 AAA chair
9 A prof
9 A- TA
9 AA chair
8 B+ prof
8 B+ TA
8 A chair
Look up table2:
Prof TA chair
10 A+ A AAA
9 A A- AA
8 B+ B+ A
Two look up tables have the same contents, and I can use either table to be the mapping table.
I want my new table look like this
new grades table:
prof_grade TA_grade chair_grade prof_score TA_score chair_score
Anne A+ A AAA 10 10 10
Peter B+ B+ AA 8 8 9
I know I can do this by multiple join, which would makes the code long and take me a long time to modify the code when more teachers are added in the look up table. Hence I want to find a more automated way without using join. I am thinking of using hash objects but the Rating in the look up table1 is not unique, unless it is combined with the Teacher column. Maybe I can use proc IML to solve this problem? Is there an easy way to create such table?
just use proc format, it is simple and straightforward.
data have;
input name $ prof_grade $ TA_grade $ chair_grade $;
datalines;
Anne A+ A A+
Peter B+ B+ AAA
Pete A+ A- AA
;
/* your lookup table for creating informats*/
data lookup;
input Score Rating $ Teacher $;
datalines;
10 A+ prof
10 A TA
10 AAA chair
9 A prof
9 A- TA
9 AA chair
8 B+ prof
8 B+ TA
8 A+ chair
;
/* creating informat*/
proc sql ;
create table crfmt as
select distinct
Teacher as fmtname,
strip(Rating) as start,
score as label,
"J" as type
from lookup;
quit;
proc format library=work cntlin=crfmt fmtlib;
run;
/* using the informat created in the table above in first 2 cases score are
character values you need to use one more input change to number as shown below*/
data want;
set have;
Prof_score = input(trim(prof_grade),$prof.);
TA_score = input(trim(TA_grade),$TA.);
/* to make it numeric value*/
chair_score = input(input(trim(chair_grade),$chair.),best32.);
run;
Edit1: if you want to address for other values. please use the below code
data have;
input name $ prof_grade $ TA_grade $ chair_grade $;
datalines;
Anne A+ A A+
Peter B+ B+ AAA
Pete A+ A- AA
Smith A+ A- AAA1A
;
/* your lookup table for creating informats*/
data lookup;
infile datalines missover;
input Score $ Rating $ Teacher $;
datalines;
10 A+ prof
10 A TA
10 AAA chair
9 A prof
9 A- TA
9 AA chair
8 B+ prof
8 B+ TA
8 A+ chair
;
/* insert rows in lookup to address other values*/
proc sql;
insert into lookup
values(" ", "Unknown" , "chair");
insert into lookup
values(" ", "Unknown" , "TA");
insert into lookup
values(" ", "Unknown" , "prof");
/* creating informat*/
proc sql ;
create table crfmt as
select distinct
Teacher as fmtname,
strip(Rating) as start,
score as label,
"J" as type
from lookup;
quit;
proc format library=work cntlin=crfmt fmtlib;
run;
/* using the informat created in the table above in first 2 cases score are
character values you need to use one more input change to number as shown below*/
data want;
set have;
if input(trim(prof_grade),$prof.) eq prof_grade
then prod_score = ' ';
else prod_score = input(trim(prof_grade),$prof.);
;
if input(trim(TA_grade),$TA.) eq TA_grade
then TA_score = ' ';
else TA_score = input(trim(TA_grade),$TA.);
if input(trim(Chair_grade),$chair.) eq Chair_grade
then chair_score = ' ';
else chair_score = input(trim(chair_grade),$chair.);
run;
Related
I have two data sets INPUT and OUTPUT.
data INPUT;
input
id 1-4
var1 $ 6-10
var2 $ 12-17
var3 $ 19-22
transformation $ 24-26
;
datalines;
1023 apple banana oats 1:1
1049 12 22 8 2x
1219 milk cream fish 1:1
;
run;
The OUTPUT dataset has a different structure. The variables do not have the same name.
data work.output;
attrib
variable_1 length=8 format=best12. label="Variable 1"
variable_2 length=$50 format=$50. label="Variable 2"
Variable_3 length=8 format=date9. label="Variable 3";
stop;
run;
OUTPUT will be filled with the values from input based on what is specified in column "transformation" in table INPUT: when "transformation" equals "1:1", I want to fill the OUTPUT ds with the values of the corresponding INPUT dataset. If this were a small excel, I would do copy & paste or a lookup.
For example, obs1 of dataset INPUT has transformation = 1:1, so I want to fill variable_1 of dataset OUTPUT with "apple", variable_2 with "banana" and variable_3 with "oats".
For the second observation of ds INPUT I want to multiply each variable with two and assign them to variable_1 - variable_3 respectively.
In my real dataset I have much more columns so I need to automate this, probalby via index, since the variable names do not correspond.
You probably need to code each transformation rule separately.
This works for your example. But you did not include any date transformations so variable3 is not used.
data INPUT;
input
id 1-4
var1 $ 6-10
var2 $ 12-17
var3 $ 19-22
transformation $ 24-26
;
datalines;
1023 apple banana oats 1:1
1049 12 22 8 2x
1219 milk cream fish 1:1
;
proc transpose data=input prefix=value out=step1;
by id transformation;
var var1-var3 ;
run;
data output;
set step1;
length variable1 8 variable2 $50 variable3 8;
format variable3 date9.;
if transformation='1:1' then variable2=value1;
if transformation='2x' then variable1 = 2*input(value1,32.);
run;
Result
Obs id transformation _NAME_ value1 variable1 variable2 variable3
1 1023 1:1 var1 apple . apple .
2 1023 1:1 var2 banana . banana .
3 1023 1:1 var3 oats . oats .
4 1049 2x var1 12 24 .
5 1049 2x var2 22 44 .
6 1049 2x var3 8 16 .
7 1219 1:1 var1 milk . milk .
8 1219 1:1 var2 cream . cream .
9 1219 1:1 var3 fish . fish .
ATTACHED SCREENSHOT OF DESIRED OUTPUTthe required condition is
"SUBJECT in A = SUBJECT in B
and
VISIT in A NE(not equal to) VISIT in B"
I would like to find the exact mismatch and missing VISIT from the below Tables A and B by using Proc SQL procedure, Can anyone help me please?
Table A
SUBJECT Test VISIT
1001 ABCB 1
1001 ABCD 2
1001 ABCD 3
1001 ABCD 5
Table B
SUBJECT Test VISIT1
1001 ABCD 2
1001 ABCD 1
1001 ABCD 4
Expected output:
SUBJECT Test VISIT VISIT1
1001 ABCD 3
1001 ABCD 5
1001 ABCD 4
VISIT 3 AND 5 IS PRESENT IN DATASET A NOT IN B AND VISIT 4 IS PRESENT IN DATASET2 NOT IN DATASET A , LIKE WISE
CODE FOR DATASET-
DATA A;
LENGTH SUBJECT 8 Test $10 visit 8;
INPUT SUBJECT Test $ visit ;
DATALINES;
1001 ABCD 1
1001 ABCD 2
1001 ABCD 3
1001 ABCD 5
;
RUN;
DATA B;
LENGTH SUBJECT 8 Test $10 visit1 8;
INPUT SUBJECT Test $ visit1 ;
DATALINES;
1001 ABCD 2
1001 ABCD 1
1001 ABCD 4
;
RUN;
Thanks in advance!
the code i tried is below (but not working as expected)-
****************(VISIT ) in A and not in B****;
proc sql;
create table SS1 as
select distinct a.* FROM
A a where a.visit not in(select s.visit1 from B s WHERE A.SUBJECT = S.SUBJECT );
create table INRAVE as
select * from SS1 A
left join
B B
on a.subject=b.SUBJECT and a.VISIT NE b.VISIT1
where b.SUBJECT is not null
;
quit;
****************VISIT in B and not in A****;
proc sql;
create table SS2 as
select distinct a.* from
B a where a.VISIT1 not in(select S.VISIT from A s WHERE A.SUBJECT = S.SUBJECT );
create table INVENDOR as
select * from SS2 A
left join
A B
on a.subject=b.SUBJECT and a.VISIT1 NE b.VISIT
where b.SUBJECT is not null
;
quit;
data ALL;;
set inrave invendor;
where subject=subject ;
RUN;
Seems you know SQL very well, why not try union all, just like this:
proc sql noprint;
create table C as
select *, 'A' as Source from A
where catx('#',SUBJECT,Test,visit) not in (
select distinct catx('#',SUBJECT,Test,visit1) from B
)
union all corr
select *, 'B' as Source from B(rename=VISIT1=VISIT)
where catx('#',SUBJECT,Test,visit) not in (
select distinct catx('#',SUBJECT,Test,visit) from A
)
;
create table D(drop=TmpVISIT Source) as
select *,
case when Source = 'B' then . else TmpVISIT end as VISIT,
case when Source = 'B' then TmpVISIT else . end as VISIT1
from C(rename=VISIT=TmpVISIT);
quit;
I get all obs from dataset A where not repeat in dataset B and do the oppsite with dataset B.
Well, I also get another solution, which is shorter:
proc sql noprint;
select catx('#',SUBJECT,Test,visit) into :Ununique separated by '" "' from (
select * from A union all select * from B(rename=visit1=visit)
)
group by SUBJECT, Test, visit
having count(*) > 1;
quit;
data D;
set A B;
if catx('#',SUBJECT,Test,coalesce(visit1,visit)) in ("&Ununique") then delete;
run;
Whereas, this method is limited by the max lenth of macro variable.
I'm trying to concatenate two tables using a proc sql - union where certain variables are unique to each table. Is there a way to do this without using a NULL placeholder variable? Basically the equivalent of the following data step.
data total;
set t1 t2;
run;
A simple example of what I'm trying to do is shown below.
data animal;
input common $ Animal $ Number;
datalines;
a Ant 5
b Bird .
c Cat 17
d Dog 9
e Eagle .
f Frog 76
;
run;
data plant;
input Common $ Plant $ Number;
datalines;
g Grape 69
h Hazelnut 55
i Indigo .
j Jicama 14
k Kale 4
l Lentil 88
;
run;
proc sql;
(select animal.*, '' as plant from animal)
union all corresponding
(select plant.*, '' as animal from plant)
;
quit;
I'd like to be able to run the proc sql with having to create the plant and animal variables in the select statement.
You want outer union, not union all. That does what you expect (keeps all variables in either dataset). See Howard Schreier's excellent paper on SQL set theory for more information.
proc sql;
create table test as
select * from animal
outer union corr
select * from plant
;
quit;
Ok my last question I am having a hard time formatting this
data practice;
input
Datalines;
employee_id Name gender years dept salary Birthday
1 Mitchell, Jane A f 6 shoe 22,450 12/30/1960
2 Miller, Frances T f 8 appliance . 11/27/1965
3 Evans, Richard A m 9 appliance 42,900 02/15/1973
4 Fair, Suzanne K f 3 clothing 29,700 03/09/1958
5 Meyers, Thomas D m 5 appliance 33,700 10/22/1961
6 Rogers, Steven F m 3 shoe 27,000 09/12/1960
7 Anderson, Frank F m 5 clothing 33,000 03/09/1958
10 Baxter, David T m 2 shoe 23,900 11/25/1966
11 Wood, Brenda L f 3 clothing 33,000 01/14/1962
12 Wheeler, Vickie M f 7 appliance 31,500 12/23/1975
13 Hancock, Sharon T f 1 clothing 21,000 01/17/1972
14 Looney, Roger M m 10 appliance 31,500 06/09/1973
15 Fry, Marie E f 6 clothing 29,700 05/25/1967
;
run;quit;
Proc print data=practice;
run;quit;
Ok my question is there a way to do this without having to count each individual space? Even when I do count the data still does not properly print out what am I doing wrong? Thanks in advance this should be my last question afterwards I should be ready for this final.
If you don't assign a character length, SAS will use the length of the first value it encounters and assign it to all the values in that column. You can use the statement length var $w; before your data lines statement to set your own length. Using the option dsd tells SAS to use comma as your variable delimiter, read strings enclosed in quotation marks as a single variable, and to strip them off before saving the variable. If using blank spaces as your delimiter, make sure there are no blank spaces in front of each row below the dataline statement.
data practice;
infiles datalines dsd;
length Name $50. dept $9.;
input employee_id Name $ gender $ years dept $ salary $ Birthday MMDDYY10.;
format Birthday MMDDYY10.;
Datalines;
1, "Mitchell, Jane A", f, 6, shoe, "22,450", 12/30/1960
2, "Miller, Frances T", f, 8, appliance, , 11/27/1965
;
run;
Proc print data=practice;
run;quit;
I have a dataset that looks something like this:
IDnum State Product Consumption
123 MI A 30
123 MI B 20
123 MI C 45
456 NJ A 15
456 NJ D 10
789 MI B 60
... ... ... ...
And i would like to create a new dataset, where i have one row for each IDnum, and a new dummy variable for each different product (in my real dataset i have close to 1000 products), along with it's associated consumption. It would look like something in these lines
IDnum State Prod.A Cons.A Prod.B Cons.B Prod.C Cons.C Prod.D Cons.D
123 MI yes 30 yes 20 yes 45 no -
456 NJ yes 15 no - no - yes 10
789 MI no - yes 60 no - no -
... ... ... ... ... ... ... ... ... ...
Some variables like "State" doesn't change within the same IDnum, but each row in the original bank are equivalent to one purchase, hence the change in the "product" and "consumption" variables for the same IDnum. I would like that my new dataset showed all the consumption habits of each costumer in one single row, but so far i have failed.
Any help would be greatly apreciated.
Without yes/no variables, it's really easy:
data input;
length State $2 Product $1;
input IDnum State Product Consumption;
cards;
123 MI A 30
123 MI B 20
123 MI C 45
456 NJ A 15
456 NJ D 10
789 MI B 60
;
run;
proc transpose data=input out=output(drop=_NAME_) prefix=Cons_;
var Consumption;
id Product;
by IDnum State;
run;
Adding the yes/no fields:
proc sql;/* from column names or alternatively
create it from source data directly if not taking too long */
create table work.products as
select scan(name, 2, '_') as product length=1
from dictionary.columns
where libname='WORK' and memname='OUTPUT'
and upcase(name) like 'CONS_%';
quit;
filename vars temp;/* write a temp file containing variable definitions
in desired order */
data _null_;
set work.products end=last;
file vars;
length str $40;
if _N_ = 1 then put 'LENGTH ';
str = catt('Prod_', product, ' $3');
put str;
str = catt('Cons_', product, ' 8');
put str;
if last then put ';';
run;
options source2;
data output2;
length IdNum 8 State $2;
%include vars;
set output;
array prod{*} Prod_:;
array cons{*} Cons_:;
drop i;
do i=1 to dim(prod);
if coalesce(cons(i), 0) ne 0 then prod(i)='yes';
else prod(i)='no';
end;
run;