I have the following dataset:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
;
PROC PRINT; RUN;
I want to link this data to another table but the thing is that the numbers in the other table are stored in the following format: 0012, 0023, 0023.
So I am looking for a way to do the following:
Check how long the number is
If length = 1, add 3 0 values to the beginning
If length = 2, add 2 0 values to the beginning
Any thoughts on how I can get this working?
Numbers are numbers so if the other table has the field as a number then you don't need to do anything. 13 = 0013 = 13.00 = ....
If the other table actually has a character variable then you need to convert one or the other.
char_number = put(number, Z4.);
number = input(char_number, 4.);
You can use z#. formats to accomplish this:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
9999 999
8888 8
;
data survey2;
set survey;
number_long = put(number, z4.);
run;
If you need it to be four characters long, then you could do it like this:
want = put(input(number,best32.),z4.);
Related
How should the code be completed to make this work?
Code:
data ms;
infile 'C';
input cr ls ms color $;
if input #; *statemet that reads the line with one word and complete the color column*
run;
Input:
Blars
10 83287 10.00
20 1748956 30.00
30 2222222 73.00
40 833709 90.00
Klirs
10 922222 90.50
20 1222222 10.00
30 1111111 93.33
40 8998877 300.90
Expected output:
cr
ls
ms
color
10
83287
10.00
Blars
20
1748956
30.00
Blars
30
2222222
50.00
Blars
40
833709
73.00
Blars
10
922222
90.50
Klirs
20
1222222
10.00
Klirs
30
1111111
93.33
Klirs
40
8998877
300.90
Klirs
Attempted to read it
Just RETAIN the extra variable. You need some way to detect which type of line you currently are reading. When it has the COLOR just update the COLOR variable and do not write out an observation. When it has the actual data then read all of the fields and write an observation.
data ms;
infile 'C' truncover ;
length color $10 cr ls ms 8;
retain color;
input cr ?? # ;
if missing(cr) then do;
color = _infile_;
delete;
end;
input ls ms ;
run;
Make sure to define the COLOR column long enough to store the longest value. This assumes there are no blank lines, as you mentioned in your comment on the original question.
Slightly different method than other solution.
Use INPUT ## to read the full line and hold it in the automatic variable _infile_.
Check _infile_ variable to see if it contains any numeric values, if so, process as data.
Otherwise, process as a colour.
data have;
infile cards truncover;
*set length and retain color across rows;
length color $10 cr ls ms 8;
retain color;
*read in string;
input ##;
*check for any digits in string, if any are found, process as data;
if anydigit(_infile_) then do;
input cr ls ms;
output;
end;
*otherwise read in as color;
else input color $;
cards;
Blars
10 83287 10.00
20 1748956 30.00
30 2222222 73.00
40 833709 90.00
Klirs
10 922222 90.50
20 1222222 10.00
30 1111111 93.33
40 8998877 300.90
;;;;
run;
Richard, your code could even be more succinct.
* attempt to read first 2 chars as number;
* ?? suppresses errors;
input num ?? 2. #;
if missing(num) then
input #1 color $;
else do;
input #1 cr ls ms;
output;
end;
You can scan a held generic input line and then choose which input statement you want based on the scan.
data want;
length color $20 cr ls ms 8;
retain color;
infile 'c' missover;
input #;
if missing(input(scan(_infile_,1),??best12.)) then
input #1 color ;
else
input #1 cr ls ms ;
if not missing(cr);
run;
I'm working in SAS as a novice. I have two datasets:
Dataset1
Unique ID
ColumnA
1
15
1
39
2
20
3
10
Dataset2
Unique ID
ColumnB
1
40
2
55
2
10
For each UniqueID, I want to subtract all values of ColumnB by each value of ColumnA. And I would like to create a NewColumn that is 1 anytime 1>ColumnB-Column >30. For the first row of Dataset 1, where UniqueID= 1, I would want SAS to go through all the rows in Dataset 2 that also have a UniqueID = 1 and determine if there is any rows in Dataset 2 where the difference between ColumnB and ColumnA is greater than 1 or less than 30. For the first row of Dataset 1 the NewColumn should be assigned a value of 1 because 40 - 15 = 25. For the second row of Dataset 1 the NewColumn should be assigned a value of 0 because 40 - 39 = 1 (which is not greater than 1). For the third row of Dataset 1, I again want SAS to go through every row of ColumnB in Dataset 2 that has the same UniqueID as in Dataset1, so 55 - 20 = 35 (which is greater than 30) but NewColumn would still be assigned a value of 1 because (moving to row 3 of Datatset 2 which has UniqueID =2) 20 - 10 = 10 which satisfies the if statement.
So I want my output to be:
Unique ID
ColumnA
NewColumn
1
15
1
1
30
0
2
20
1
I have tried concatenating Dataset1 and Dataset2 into a FullDataset. Then I tried using a do loop statement but I can't figure out how to do the loop for each value of UniqueID. I tried using BY but that of course produces an error because that is only used for increments.
DATA FullDataset;
set Dataset1 Dataset2; /*Concatenate datasets*/
do i=ColumnB-ColumnA by UniqueID;
if 1<ColumnB-ColumnA<30 then NewColumn=1;
output;
end;
RUN;
I know I'm probably way off but any help would be appreciated. Thank you!
So, the way that answers your question most directly is the keyed set. This isn't necessarily how I'd do this, but it is fairly simple to understand (as opposed to a hash table, which is what I'd use, or a SQL join, probably what most people would use). This does exactly what you say: grabs a row of A, says for each matching row of B check a condition. It requires having an index on the datasets (well, at least on the B dataset).
data colA(index=(id));
input ID ColumnA;
datalines;
1 15
1 39
2 20
3 10
;;;;
data colB(index=(id));
input ID ColumnB;
datalines;
1 40
2 55
2 30
;;;;
run;
data want;
*base: the colA dataset - you want to iterate through that once per row;
set colA;
*now, loop while the check variable shows 0 (match found);
do while (_iorc_ = 0);
*bring in other dataset using ID as key;
set colB key=ID ;
* check to see if it matches your requirement, and also only check when _IORC_ is 0;
if _IORC_ eq 0 and 1 lt ColumnB-ColumnA lt 30 then result=1;
* This is just to show you what is going on, can remove;
put _all_;
end;
*reset things for next pass;
_ERROR_=0;
_IORC_=0;
run;
Is there an elegant way to check in SAS Base if a numeric value is made of only one kind of digit?
Example:
1 -> Yes
11 -> Yes
111 -> Yes
1111 -> Yes
1121 -> No
9999999 -> Yes
9999990 -> No
I would go with something like this.
Realize that SAS does not store leading 0s in numbers, so the last one in your example will pass -- that 0 will not show up.
This converts the numbers to strings and then compares the individual characters in the string. Alter the format in the put statement as needed.
Also note that a decimal will fail because . will be compared to the numbers. If you need these to pass, then remove the . from the string.
data have;
input x;
datalines;
1
11
12
111
1111
1121
99999999
09999999
1.11
;
run;
data test;
set have;
pass = 1;
format temp $32.;
temp = strip(put(x,best32.));
do i=1 to length(temp)-1;
pass = pass and (substr(temp,i,1) = substr(temp,i+1,1));
if ^pass then leave;
end;
drop temp i;
run;
Just want to share an additional solution with regex:
data have;
input x;
datalines;
1
11
12
111
1111
1121
99999999
9999990
;
run;
data want;
set have;
if PRXMATCH("/\b1+\b|\b2+\b|\b3+\b|\b4+\b|\b5+\b|\b6+\b|\b7+\b|\b8+\b|\b9+\b|\b0+\b/",x);
run;
I need your help on developing a de-hoc query for hoc(range) data, below is an example of Shares Outstanding HOC:
ID StartDT EndDT SharesOutstanding
ABC 01-Jan-2010 03-Feb-2013 100
ABC 04-Feb-2014 03-Sep-2014 160
XYZ 01-Jan-2011 03-Mar-2012 52
XYZ 04-Mar-2012 09-Aug-2013 108
XYZ 10-Aug-2013 03-Sep-2014 120
Now I want to dehoc or break the above range data to per day. Below is the desired output:
ID Date Shares
ABC 01-Jan-2010 100
ABC 02-Jan-2010 100
ABC 03-Jan-2010 100
ABC 04-Jan-2010 100
ABC 05-Jan-2010 100
.......
ABC 03-Feb-2014 100
ABC 04-Feb-2014 160
....till 03-Sep-2014
I am using SAS Code with PROCSQL but that is very time consuming
Need your help on this query at earliest
Thanks
Hitesh
This should be fairly easy with a data step and some do-loops.
data want(drop = StartDT EndDT i);
set have;
format date date9.;
do i = 0 to (EndDT-StartDT);
date = StartDT + i;
output;
end;
run;
Do you really want lots of repeated rows, though, or are you just interested in getting the difference of dates?
I have the following format:
value agecf 0 = "35-40" 1 = "41-45" 2 = "46-50" 3 = "51-55" 4 = "56-60";
But then I type the following: format age agecf.; I still get all of the observations (e.g. 35,36,37,.....) instead of the observations grouped into 5 levels. Why?
You just reversed the left and right sides of the format. The formatted value goes on the right, the original value on the left.
Below is an example with using your format and one which is probably what you were trying to create.
proc format;
value agecf 0 = "35-40" 1 = "41-45" 2 = "46-50" 3 = "51-55" 4 = "56-60";
value newage 35-40="0" 41-45="1" 46-50="2" 51-55="3" 56-60="4";
run;
data test;
input value1;
value2=value1;
format value1 agecf. value2 newage.;
datalines;
35
45
50
37
46
55
60
;
proc print data=test;run;