SAS formatting datalines - sas

Ok my last question I am having a hard time formatting this
data practice;
input
Datalines;
employee_id Name gender years dept salary Birthday
1 Mitchell, Jane A f 6 shoe 22,450 12/30/1960
2 Miller, Frances T f 8 appliance . 11/27/1965
3 Evans, Richard A m 9 appliance 42,900 02/15/1973
4 Fair, Suzanne K f 3 clothing 29,700 03/09/1958
5 Meyers, Thomas D m 5 appliance 33,700 10/22/1961
6 Rogers, Steven F m 3 shoe 27,000 09/12/1960
7 Anderson, Frank F m 5 clothing 33,000 03/09/1958
10 Baxter, David T m 2 shoe 23,900 11/25/1966
11 Wood, Brenda L f 3 clothing 33,000 01/14/1962
12 Wheeler, Vickie M f 7 appliance 31,500 12/23/1975
13 Hancock, Sharon T f 1 clothing 21,000 01/17/1972
14 Looney, Roger M m 10 appliance 31,500 06/09/1973
15 Fry, Marie E f 6 clothing 29,700 05/25/1967
;
run;quit;
Proc print data=practice;
run;quit;
Ok my question is there a way to do this without having to count each individual space? Even when I do count the data still does not properly print out what am I doing wrong? Thanks in advance this should be my last question afterwards I should be ready for this final.

If you don't assign a character length, SAS will use the length of the first value it encounters and assign it to all the values in that column. You can use the statement length var $w; before your data lines statement to set your own length. Using the option dsd tells SAS to use comma as your variable delimiter, read strings enclosed in quotation marks as a single variable, and to strip them off before saving the variable. If using blank spaces as your delimiter, make sure there are no blank spaces in front of each row below the dataline statement.
data practice;
infiles datalines dsd;
length Name $50. dept $9.;
input employee_id Name $ gender $ years dept $ salary $ Birthday MMDDYY10.;
format Birthday MMDDYY10.;
Datalines;
1, "Mitchell, Jane A", f, 6, shoe, "22,450", 12/30/1960
2, "Miller, Frances T", f, 8, appliance, , 11/27/1965
;
run;
Proc print data=practice;
run;quit;

Related

Populating a dataset depending on the values of a variable in another dataset

I have two data sets INPUT and OUTPUT.
data INPUT;
input
id 1-4
var1 $ 6-10
var2 $ 12-17
var3 $ 19-22
transformation $ 24-26
;
datalines;
1023 apple banana oats 1:1
1049 12 22 8 2x
1219 milk cream fish 1:1
;
run;
The OUTPUT dataset has a different structure. The variables do not have the same name.
data work.output;
attrib
variable_1 length=8 format=best12. label="Variable 1"
variable_2 length=$50 format=$50. label="Variable 2"
Variable_3 length=8 format=date9. label="Variable 3";
stop;
run;
OUTPUT will be filled with the values from input based on what is specified in column "transformation" in table INPUT: when "transformation" equals "1:1", I want to fill the OUTPUT ds with the values of the corresponding INPUT dataset. If this were a small excel, I would do copy & paste or a lookup.
For example, obs1 of dataset INPUT has transformation = 1:1, so I want to fill variable_1 of dataset OUTPUT with "apple", variable_2 with "banana" and variable_3 with "oats".
For the second observation of ds INPUT I want to multiply each variable with two and assign them to variable_1 - variable_3 respectively.
In my real dataset I have much more columns so I need to automate this, probalby via index, since the variable names do not correspond.
You probably need to code each transformation rule separately.
This works for your example. But you did not include any date transformations so variable3 is not used.
data INPUT;
input
id 1-4
var1 $ 6-10
var2 $ 12-17
var3 $ 19-22
transformation $ 24-26
;
datalines;
1023 apple banana oats 1:1
1049 12 22 8 2x
1219 milk cream fish 1:1
;
proc transpose data=input prefix=value out=step1;
by id transformation;
var var1-var3 ;
run;
data output;
set step1;
length variable1 8 variable2 $50 variable3 8;
format variable3 date9.;
if transformation='1:1' then variable2=value1;
if transformation='2x' then variable1 = 2*input(value1,32.);
run;
Result
Obs id transformation _NAME_ value1 variable1 variable2 variable3
1 1023 1:1 var1 apple . apple .
2 1023 1:1 var2 banana . banana .
3 1023 1:1 var3 oats . oats .
4 1049 2x var1 12 24 .
5 1049 2x var2 22 44 .
6 1049 2x var3 8 16 .
7 1219 1:1 var1 milk . milk .
8 1219 1:1 var2 cream . cream .
9 1219 1:1 var3 fish . fish .

SAS lookup tables to match data

I am trying to create a scoring table by looking up a grading system table. There are three teachers grade all the students, and they have their own way of grading. I am trying standardize students' marks by mapping to the look up table. My tables look like this:
old grades table:
prof_grade TA_grade chair_grade
Anne A+ A AAA
Peter B+ B+ AA
Look up table1:
Score Rating Teacher
10 A+ prof
10 A TA
10 AAA chair
9 A prof
9 A- TA
9 AA chair
8 B+ prof
8 B+ TA
8 A chair
Look up table2:
Prof TA chair
10 A+ A AAA
9 A A- AA
8 B+ B+ A
Two look up tables have the same contents, and I can use either table to be the mapping table.
I want my new table look like this
new grades table:
prof_grade TA_grade chair_grade prof_score TA_score chair_score
Anne A+ A AAA 10 10 10
Peter B+ B+ AA 8 8 9
I know I can do this by multiple join, which would makes the code long and take me a long time to modify the code when more teachers are added in the look up table. Hence I want to find a more automated way without using join. I am thinking of using hash objects but the Rating in the look up table1 is not unique, unless it is combined with the Teacher column. Maybe I can use proc IML to solve this problem? Is there an easy way to create such table?
just use proc format, it is simple and straightforward.
data have;
input name $ prof_grade $ TA_grade $ chair_grade $;
datalines;
Anne A+ A A+
Peter B+ B+ AAA
Pete A+ A- AA
;
/* your lookup table for creating informats*/
data lookup;
input Score Rating $ Teacher $;
datalines;
10 A+ prof
10 A TA
10 AAA chair
9 A prof
9 A- TA
9 AA chair
8 B+ prof
8 B+ TA
8 A+ chair
;
/* creating informat*/
proc sql ;
create table crfmt as
select distinct
Teacher as fmtname,
strip(Rating) as start,
score as label,
"J" as type
from lookup;
quit;
proc format library=work cntlin=crfmt fmtlib;
run;
/* using the informat created in the table above in first 2 cases score are
character values you need to use one more input change to number as shown below*/
data want;
set have;
Prof_score = input(trim(prof_grade),$prof.);
TA_score = input(trim(TA_grade),$TA.);
/* to make it numeric value*/
chair_score = input(input(trim(chair_grade),$chair.),best32.);
run;
Edit1: if you want to address for other values. please use the below code
data have;
input name $ prof_grade $ TA_grade $ chair_grade $;
datalines;
Anne A+ A A+
Peter B+ B+ AAA
Pete A+ A- AA
Smith A+ A- AAA1A
;
/* your lookup table for creating informats*/
data lookup;
infile datalines missover;
input Score $ Rating $ Teacher $;
datalines;
10 A+ prof
10 A TA
10 AAA chair
9 A prof
9 A- TA
9 AA chair
8 B+ prof
8 B+ TA
8 A+ chair
;
/* insert rows in lookup to address other values*/
proc sql;
insert into lookup
values(" ", "Unknown" , "chair");
insert into lookup
values(" ", "Unknown" , "TA");
insert into lookup
values(" ", "Unknown" , "prof");
/* creating informat*/
proc sql ;
create table crfmt as
select distinct
Teacher as fmtname,
strip(Rating) as start,
score as label,
"J" as type
from lookup;
quit;
proc format library=work cntlin=crfmt fmtlib;
run;
/* using the informat created in the table above in first 2 cases score are
character values you need to use one more input change to number as shown below*/
data want;
set have;
if input(trim(prof_grade),$prof.) eq prof_grade
then prod_score = ' ';
else prod_score = input(trim(prof_grade),$prof.);
;
if input(trim(TA_grade),$TA.) eq TA_grade
then TA_score = ' ';
else TA_score = input(trim(TA_grade),$TA.);
if input(trim(Chair_grade),$chair.) eq Chair_grade
then chair_score = ' ';
else chair_score = input(trim(chair_grade),$chair.);
run;

SAS concatenate in SAS Data Step

I don't know how to describe this question but here is an example. I have an initial dataset looks like this:
input first second $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
...
;
I want an output dataset like this:
input first second $;
cards;
1 "A,B,C,D"
2 "E,F"
3 "S,A"
4 "C"
5 "Y"
6 "II,UU,OO,N"
7 "G,H"
...
;
Both tables will have two columns. Unique value of range of the column "first" could be 1 to any number.
Can someone help me ?
something like below
proc sort data=have;
by first second;
run;
data want(rename=(b=second));
length new_second $50.;
do until(last.first);
set have;
by first second ;
new_second =catx(',', new_second, second);
b=quote(strip(new_second));
end;
drop second new_second;
run;
output is
first second
1 "A,B,C,D"
2 "E,F"
3 "A,S"
4 "C"
5 "Y"
6 "II,N,OO,UU"
7 "G,H"
You can use by-group processing and the retain function to achieve this.
Create a sample dataset:
data have;
input id value $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
;
run;
First ensure that your dataset is sorted by your id variable:
proc sort data=have;
by id;
run;
Then use the first. and last. notation to identify when the id variable is changing or about to change. The retain statement tells the datastep to keep the value within concatenated_value over observations rather than resetting it to a blank value. Use the quote() function to apply the " chars around the result before outputting the record. Use the cats() function to perform the actual concatenation and separate the records with a ,.
data want;
length contatenated_value $500.;
set have;
by id;
retain contatenated_value ;
if first.id then do;
contatenated_value = '';
end;
contatenated_value = catx(',', contatenated_value, value);
if last.id then do;
contatenated_value = quote(cats(contatenated_value));
output;
end;
drop value;
run;
Output:
contatenated_
value id
"A,B,C,D" 1
"E,F" 2
"S,A" 3
"C" 4
"Y" 5
"II,UU,OO,N" 6
"G,H" 7

Transform numbers with 0 values at the beginning

I have the following dataset:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
;
PROC PRINT; RUN;
I want to link this data to another table but the thing is that the numbers in the other table are stored in the following format: 0012, 0023, 0023.
So I am looking for a way to do the following:
Check how long the number is
If length = 1, add 3 0 values to the beginning
If length = 2, add 2 0 values to the beginning
Any thoughts on how I can get this working?
Numbers are numbers so if the other table has the field as a number then you don't need to do anything. 13 = 0013 = 13.00 = ....
If the other table actually has a character variable then you need to convert one or the other.
char_number = put(number, Z4.);
number = input(char_number, 4.);
You can use z#. formats to accomplish this:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
9999 999
8888 8
;
data survey2;
set survey;
number_long = put(number, z4.);
run;
If you need it to be four characters long, then you could do it like this:
want = put(input(number,best32.),z4.);

Stata: convert a matrix to dataset without losing names

This question has been asked before but the answers do not seem to apply here. I would like to make a dataset from my regression output, without losing information. Consider:
clear *
input str3 iso3 var1 var2 var3
GBR 10 13 15
USA 9 7 4
FRA 8 8 7
BEL 3 4 5
end
local vars var2 var3
reg var1 var2 var3
matrix A=r(table)
matrix list A
clear
xsvmat A, names(col) norestore
Where Stata complains about the _cons column. I'm not interested in this column (although I also don't understand why it is such a problem to include it) but I don't find an option to cope with this in the xsvmat, svmat or svmat2 help.
Although Stata variable names can usually start with an underscore _, [U] 11.3 Naming conventions explains that _cons is a reserved name, and they can't be used as variable names.
I think you want this:
clear
set more off
input ///
str3 iso3 var1 var2 var3
GBR 10 13 15
USA 9 7 4
FRA 8 8 7
BEL 3 4 5
end
local vars var2 var3
reg var1 var2 var3
matrix A = r(table)
// get original row names of matrix (and row count)
local rownames : rowfullnames A
local c : word count `rownames'
// get original column names of matrix and substitute out _cons
local names : colfullnames A
local newnames : subinstr local names "_cons" "cons", word
// rename columns of matrix
matrix colnames A = `newnames'
// convert to dataset
clear
svmat A, names(col)
// add matrix row names to dataset
gen rownames = ""
forvalues i = 1/`c' {
replace rownames = "`:word `i' of `rownames''" in `i'
}
// check
order rownames
list, sep(0)
Extended macro functions are used. See help extended_fcn if you're not familiar with them.
See also this answer, which is very similar, and suggests postfile and statsby.
Finally, check ssc describe estout, if your goal is to output regression tables.