how to retain values in a sas dataset? - sas

I have the following dataset
data have;
input pop$ district$ racemajor$;
cards;
color Aberdeen .
white Aberdeen .
Black Aberdeen .
Asian Aberdeen .
Black Adelaid Yes
Color Adelaid .
white Adelaid .
Asian Adelaid .
White Bellvill .
black Bellvill .
Asian Bellvill .
;
run;
Basically I want to drag the value 'Yes' if racemajor is 'Yes' for the corresponding district so that it looks like the following
data want;
color Aberdeen .
white Aberdeen .
Black Aberdeen .
Asian Aberdeen .
Black Adelaid Yes
Color Adelaid Yes
white Adelaid Yes
Asian Adelaid Yes
White Bellvill .
black Bellvill .
Asian Bellvill .
I know that I can use the first. and retain statement to do this, and I tried the following. However, it does not seem to work.
data want;
set have;
if first.district and racemajor='Yes';
retain racemajor;
run;

try this, although to be safe you should sort the data on district
data NEW;
drop test;
SET HAVE;
by district;
retain test;
if first.district then test = racemajor;
racemajor=test;
run;

This is good example for the update trick, where the master is empty and you output all obs.
data have;
input (pop district racemajor) ($);
cards;
color Aberdeen .
white Aberdeen .
Black Aberdeen .
Asian Aberdeen .
Black Adelaid Yes
Color Adelaid .
white Adelaid .
Asian Adelaid .
White Bellvill .
black Bellvill .
Asian Bellvill .
;;;;
run;
proc print;
run;
data want;
update have(obs=0) have;
by district;
output;
run;
proc print;
run;

Related

SAS proc import then proc format: ERROR: For format $xxxxx, this range is repeated, or values overlap: C4311-C4311

I am a SAS novice and I have encountered this issue. I already referred to several posts including this: [SAS Formats]ERROR: For format COUNTRIES, this range is repeated, or values overlap: .-.
I used following code block to export a particular entry (Ias1012324y22y23mc) in my SAS catalog
libname perm '<path>';
filename tempfile '<filename>.csv' ;
proc FORMAT FMTLIB LIB=formats.formats cntlout=sasuser.fmtdata;
select $Ias1012324y22y23mc;
run;
proc export data=sasuser.fmtdata outfile=tempfile dbms=csv replace;
run;
quit;
My intention is to make a few changes and import into a different catalog but I needed to verify so I am uploading the exact same csv file but I still ran into this issue:
ERROR: For format $IAS1012324Y22Y23MC, this range is repeated, or values overlap: C4311-C4311
Here is my import script:
libname perm '<path>';
filename tempfile '<filename>.csv' ;
PROC IMPORT
datafile=tempfile OUT=updated DBMS=CSV REPLACE;
GETNAMES=YES;
RUN;
proc format library=perm.library fmtlib cntlin=updated;
select $IAS1012324Y22Y23MC;
run;
quit;
I also tried to add a controlset with no luck as mentioned here: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n03qskwoints2an1ispy57plwrn9.htm
PROC IMPORT
datafile=tempfile OUT=updated DBMS=CSV REPLACE;
GETNAMES=YES;
RUN;
data ctrl;
retain fmtname '$IAS1012324Y22Y23MC';
length FMTNAME $32. START $9. END $9. LABEL $23. PREFIX $2. FILL $1. TYPE $1. SEXCL $1. EEXCL $1. HLO $13. DECSEP $1. DIG3SEP $1. DATATYPE $8. LANGUAGE $8.;
set updated;
* proc print;
run;
proc format library=perm.library fmtlib cntlin=ctrl;
select $IAS1012324Y22Y23MC;
run;
quit;
Here is my dataset where the overlap is happening
FMTNAME,START,END,LABEL,MIN,MAX,DEFAULT,LENGTH,FUZZ,PREFIX,MULT,FILL,NOEDIT,TYPE,SEXCL,EEXCL,HLO,DECSEP,DIG3SEP,DATATYPE,LANGUAGE
IAS1012324Y22Y23MC,C4310,C4310,23,1,40,4,4,0,,0,,0,C,N,N,,,,,
IAS1012324Y22Y23MC,C43111,C43111,23,1,40,4,4,0,,0,,0,C,N,N,,,,,
IAS1012324Y22Y23MC,C43112,C43112,23,1,40,4,4,0,,0,,0,C,N,N,,,,,
IAS1012324Y22Y23MC,C43121,C43121,23,1,40,4,4,0,,0,,0,C,N,N,,,,,
IAS1012324Y22Y23MC,C43122,C43122,23,1,40,4,4,0,,0,,0,C,N,N,,,,,
IAS1012324Y22Y23MC,C4320,C4320,23,1,40,4,4,0,,0,,0,C,N,N,,,,,
Clearly the issue seems to be default value range for START when I import the data it came with 4, I edited the csv file and changed default column to 9 but still the same issue.
Update
Here is the generated data step after adding GUESSINGROWS=MAX; still the same issue.
data WORK.UPDATED ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'F:\SAS Programs\RAF2024InitialModel\import-model-sg\data-in-ascii.txt' delimiter = ',' MISSOVER DSD lrecl=13106 firstobs=2 ;
informat FMTNAME $18. ;
informat START $9. ;
informat END $9. ;
informat LABEL best32. ;
informat MIN best32. ;
informat MAX best32. ;
informat DEFAULT best32. ;
informat LENGTH best32. ;
informat FUZZ best32. ;
informat PREFIX $1. ;
informat MULT best32. ;
informat FILL $1. ;
informat NOEDIT best32. ;
informat TYPE $1. ;
informat SEXCL $1. ;
informat EEXCL $1. ;
informat HLO $1. ;
informat DECSEP $1. ;
informat DIG3SEP $1. ;
informat DATATYPE $1. ;
informat LANGUAGE $1. ;
format FMTNAME $18. ;
format START $9. ;
format END $9. ;
format LABEL best12. ;
format MIN best12. ;
format MAX best12. ;
format DEFAULT best12. ;
format LENGTH best12. ;
format FUZZ best12. ;
format PREFIX $1. ;
format MULT best12. ;
format FILL $1. ;
format NOEDIT best12. ;
format TYPE $1. ;
format SEXCL $1. ;
format EEXCL $1. ;
format HLO $1. ;
format DECSEP $1. ;
format DIG3SEP $1. ;
format DATATYPE $1. ;
format LANGUAGE $1. ;
input
FMTNAME $
START $
END $
LABEL
MIN
MAX
DEFAULT
LENGTH
FUZZ
PREFIX $
MULT
FILL $
NOEDIT
TYPE $
SEXCL $
EEXCL $
HLO $
DECSEP $
DIG3SEP $
DATATYPE $
LANGUAGE $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Don't let PROC IMPORT GUESS how to read your text file. Write your own code instead. If you do have to use PROC IMPORT to read a text file make sure to always use the GUESSINGROWS=MAX; statement so that it checks the whole file before deciding the type and length to use for each variable.

Formatting a Stata table like a table in SAS

I have a 3-way table in Stata that looks like this:
I would like to format this 3-way crosstab like a table in SAS that looks like this:
The actual output in the table isn't important, I just want to know how I can change the formatting of the Stata table. Any help is appreciated!
The groups command from the Stata Journal will get you most of the way. This reproducible example doesn't exhaust the possibilities.
. webuse nlswork, clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. groups union race , show(f F p P) sepby(union)
+--------------------------------------------------+
| union race Freq. #<= Percent %<= |
|--------------------------------------------------|
| 0 white 10777 10777 56.02 56.02 |
| 0 black 3784 14561 19.67 75.69 |
| 0 other 167 14728 0.87 76.56 |
|--------------------------------------------------|
| 1 white 2817 17545 14.64 91.20 |
| 1 black 1649 19194 8.57 99.77 |
| 1 other 44 19238 0.23 100.00 |
+--------------------------------------------------+
The command must be installed before you can use it. groups is a lousy search term, but this search will find the 2017 write-up and later updates of the software (at the time of writing, just one in 2018).
. search st0496, entry
Search of official help files, FAQs, Examples, and Stata Journals
SJ-18-1 st0496_1 . . . . . . . . . . . . . . . . . Software update for groups
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q1/18 SJ 18(1):291
groups exited with an error message if weights were specified;
this has been corrected
SJ-17-3 st0496 . . . . . Speaking Stata: Tables as lists: The groups command
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):760--773
presents command for listing group frequencies and percents and
cumulations thereof; for various subsetting and ordering by
frequencies, percents, and so on; for reordering of columns;
and for saving tabulated data to new datasets

sas infile dataline trunctuates at 8 characters

Having not worked with SAS for a couple of years, i am trying to get back into it...
I am trying to read data with comma-delimited datalines. While there are plenty of examples, I can't quite get the following to import my data correctly:
data h0;
infile datalines delimiter=',';
input
kst
kst_bez $
hx $
hx_bez $
hxx $
hxx_bez $
hxxx $
hxxx_bez $
;
datalines;
10000,Team 1 South,H0,Group,H10,Retail,H112,Retail Germany
10001,Team 2 North & West,H0,H10,Retail Division 2,H112,Retail Germany
10003,Human Res,H0,Group,H20,HR,H112,HR Germany
;
I would have thought that delimiter=',' tells SAS to simply read the data between my ,-Characters into something like a VARCHAR-variable... however, any alphanumeric data is truncated at 8 characters.
I vaguely remember I have to use something like $varying40., which is in line with the examples I found - however, if I add this to my variables, the variable doesn't stop at the ,, but instead reads the whole, say, 40 characters.
Any hints?
Thanks a ton!
If you don't define them otherwise SAS will default all characters variables to length 8. It is probably clearer for you and the SAS compiler if you explicitly define the variables using a LENGTH or ATTRIB statement before using them. Otherwise SAS has to guess at how you wanted them defined based on how they are first used.
data h0;
length kst 8 kst_bez $20 hx $20 hx_bez $20 hxx $20 hxx_bez $20
hxxx $20 hxxx_bez $20
;
infile datalines dsd truncover ;
input kst -- hxxx_bez ;
datalines;
...
You could add in-line informat specifications to the INPUT statement as the first use of the variable and SAS will default to the width of the informat used, but make sure to add the colon prefix to prevent SAS from reading past the delimiters.
data h0;
infile datalines dsd truncover ;
input kst kst_bez :$20. hx :$20. hx_bez :$20. hxx :$20. hxx_bez :$20.
hxxx :$20. hxxx_bez :$20.
;
datalines;
...

Default behavior of Input Buffer in SAS while reading data from external file

Contents of a.txt
22
333
4444
55555
But when i run this code :
data numbers;
infile ’c:\a.txt’;
input var 5.;
/* list */ ;
run;
the data in numbers.sas is saved as :
333
55555
** Note the format of the data in numbers.sas and the format in a.txt
But when i use the list the input buffer is somewhat like this :
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
2 333 3
4 55555 5
Why doesnt sas show 1 and 3?? And how is the input buffer reading?
Please explain
Try adding TRUNCOVER to your infile statement or remove the 5. after your input statement. SAS now expects a 5 digit number. If will continue reading if the line on your sourcefile is less then 5 characters long.
data numbers;
infile 'c:\a.txt' truncover;
input var 5.;
run;
For more infor read this paper on infile statement options

Comparing datasets

I have 2 datasets. 1 containing the columns origin_zip(number) and destination_zip(char) and tracking_number(char) and the other containing zip.
I would like to compare these 2 datasets so I can see all the tracking numbers and destination_zips that are not in the zip column of the second dataset.
Additionally I would like to see all of the tracking_numbers and origin_zips where the origin_zips = the destination_zips.
How would I accomplish this?
origin_zip destination_zip tracking_number
12345 23456 11111
34567 45678 22222
12345 12345 33333
zip
12345
34567
23456
results_tracking_number
22222
33333
Let's start with this...I don't think this completely answers your question, but follow up with comments and I will help if I can...
data zips;
input origin_zip $ destination_zip $ tracking_number $;
datalines;
12345 23456 11111
34567 45678 22222
56789 12345 33333
;
data zip;
input zip $;
datalines;
12345
54321
34567
76543
56789
;
Proc sort data=zips;
by origin_zip;
run;
Proc sort data=zip;
by zip;
run;
Data contained not_contained;
merge zip(in=a) zips(in=b rename=(origin_zip=zip));
by zip;
if a and b then output contained;
if a and not b then output not_contained;
run;