Find the prime DATE - sas

2019-05-23 is a normal but great day. It is normal because there is nothing big happen to me, it is also very great since I find this story in my social network.
20190523 is a prime.
0190523 is a prime.(let's just allow this leading zero)
190523 is a prime.
...
23 is a prime.
3 is a prime.
So my question is : Are there any other date behaves the same properties?
And what I want is a very nice method so let's limit the search range as 2019-05-23 to 9999-12-31.
Hope you can enjoy this puzzle.

So there was just a discussion about how to generate prime numbers using Sieve of Eratosthenes method. So we can first run that and then loop over the dates and convert them into the series of numbers and check if they are primes.
data prime_dates ;
array sieve[99991231] _temporary_ ;
sieve[1]=1;
do i=2 to int(sqrt(hbound(sieve)));
if sieve[i]=. then do j=i**2 by i to hbound(sieve);
sieve[j]=1;
end;
end;
do date='23may2019'd to '31DEC9999'd ;
string=put(date,yymmddn8.);
prime=1;
do pos=1 to 8 while(prime);
prime=.=sieve[input(substr(string,pos),8.)];
end;
if prime then output;
end;
stop;
drop i j pos prime;
format date yymmdd10.;
run;
This results in 409 similar date. The next is in August of this year. Here are the first 10.
Obs date string
1 2019-05-23 20190523
2 2019-08-23 20190823
3 2030-03-17 20300317
4 2036-03-17 20360317
5 2040-03-07 20400307
6 2040-08-23 20400823
7 2048-01-07 20480107
8 2060-03-17 20600317
9 2066-06-17 20660617
10 2070-01-03 20700103
Note there is no need to add index 0 to the sieve since the 10th, 20th and 30th day of the month will fail at the 2 digit check as being not prime so index of 0 is never attempted.
I counted 1 as NOT prime. If you count 1 as prime then you get an additional 185 dates, first one being the first day of year 2060.

Related

How do I create a new row within an if-condition in SAS?

What I have:
Number Cost Amount
52 98 1
108 50 3
922 12 1
What I want:
Number Cost
52 98
108 50
109 50
110 50
922 12 1
My dataset has a variable Amount. If Amount is 2 for a certain row, I want to create a new row right beneath it with the same Cost and the Number equal to that of the row above + 1. If the Amount is 3, I want to create two new rows right beneath it, both with the same Cost and with the Numbers being Number from row above +1 and Number from row above +2, and so on.
My final step would be to delete the Amount column, which I can do with
data want (drop=Amount);
set have;
I am having problems implementing this, my thoughts have been to use proc sql insert into but I am having trouble combining this with an if condition that runs through the amount variable.
Code to reproduce table:
proc sql;
create table want
(Number num, Cost num, Amount num);
insert into want
values(52,98,1)
values(108,50,3)
values(922,12,1);
This can help you:
proc sort data=want out=want_s nodupkey;
by Number;
run;
data result;
keep Number Cost;
set want_s;
do i=1 to Amount;
output;
Number=Number+1;
end;
run;
You might need to take care that Number does not overlap with the next input row like below:
Number ; Amount
108 ; 10
110 ; 1
Use a DO loop to output the AMOUNT number of rows. You can code the index variable of the loop to increment the NUMBER
Example (untested)
data want(keep=number cost);
set have;
do number = number to number + amount-1;
output;
end;
However, you may not need to perform this expansion of data in some cases. Many SAS Procedures provide a WEIGHT or FREQ statement that allows a variable to perform that statistical or processing roles.

How to calculate month number in sas

Hi I need to calculate the value of month supposed in sas
01jan1960 is equal to 1
02jan1960 is equal to 2
So I need to calculate for 01aug2020
I used intck function but no output
I want in datastep only .
SAS stores dates as the number of days since 1960 with zero representing first day of 1960. To represent a date in a program just use a quoted string followed by the letter D. The string needs to be something the DATE informat can interpret.
Let's run a little test.
6 data _null_;
7 do dt=0 to 3,"01-JAN-1960"d,'01AUG2020'd;
8 put dt= +1 dt date9.;
9 end;
10 run;
dt=0 01JAN1960
dt=1 02JAN1960
dt=2 03JAN1960
dt=3 04JAN1960
dt=0 01JAN1960
dt=22128 01AUG2020
So the date value for '01AUG2020'd is 22,128.
Subtraction works
days_interval = '01Aug2020'd - '01Jan1960'd;
Or looking at the unformatted value as SAS stores dates from 01Jan1960
days_interval = '01Aug2020'd;
format days_interval 8.;

Weighted Rank in SAS

I have a table with different scores for R60,R90,R120,R150,R180 and how can I make one table with a weighted rank based on this five variables, and CODE_RAC where NORM_PCT has 40% weightage, RB_PCT has 30% weightage and RB_PCT has 40% weightage ][1]
Can you help me with this in SAS Enterprise Edition? Please find the sample attached from the dataset
This isn't done with enterprise edition, but I hope it would serve.
There should be a proc rank program, which does the ranking for you. Either that or you can just sort the data by calculated 'ranking variable (rank_calc in example). I'm quite sure you could do this in single step, but may this be more informative.
data Begin;
length code_rac $10 norm_R60 3 rb_R60 3 Reso_R60 3;
input code_rac norm_R60 rb_R60 Reso_R60;
datalines;
first 10 6 2
second 0 0 10
third 8 6 4
forth 0 10 7
fifth 0 0 8
;
ruN;
data begin; /*Calculate weighted value for ranking*/
set begin;
rank_calc= norm_R60*0.4 + rb_R60*0.3 + Reso_R60*0.4;
run;
proc rank data=begin out=sorted_by_rank;
var rank_calc;
ranks my_rank;
run;
For more on ranking see http://www.lexjansen.com/nesug/nesug09/ap/AP01.pdf

SAS: Replacing missing value with average of nearest neighbors

I am trying to find a quick way to replace missing values with the average of the two nearest non-missing values. Example:
Id Amount
1 10
2 .
3 20
4 30
5 .
6 .
7 40
Desired output
Id Amount
1 10
2 **15**
3 20
4 30
5 **35**
6 **35**
7 40
Any suggestions? I tried using the retain function, but I can only figure out how to retain last non-missing value.
I thinks what you are looking for might be more like interpolation. While this is not mean of two closest values, it might be useful.
There is a nifty little tool for interpolating in datasets called proc expand. (It should do extrapolation as well, but I haven't tried that yet.) It's very handy when making series of of dates and cumulative calculations.
data have;
input Id Amount;
datalines;
1 10
2 .
3 20
4 30
5 .
6 .
7 40
;
run;
proc expand data=have out=Expanded;
convert amount=amount_expanded / method=join;
id id; /*second is column name */
run;
For more on the proc expand see documentation: https://support.sas.com/documentation/onlinedoc/ets/132/expand.pdf
This works:
data have;
input id amount;
cards;
1 10
2 .
3 20
4 30
5 .
6 .
7 40
;
run;
proc sort data=have out=reversed;
by descending id;
run;
data retain_non_missing;
set reversed;
retain next_non_missing;
if amount ne . then next_non_missing = amount;
run;
proc sort data=retain_non_missing out=ordered;
by id;
run;
data final;
set ordered;
retain last_non_missing;
if amount ne . then last_non_missing = amount;
if amount = . then amount = (last_non_missing + next_non_missing) / 2;
run;
but as ever, will need extra error checking etc for production use.
The key idea is to sort the data into reverse order, allowing it to use RETAIN to carry the next_non_missing value back up the data set. When sorted back into the correct order, you then have enough information to interpolate the missing values.
There may well be a PROC to do this in a more controlled way (I don't know anything about PROC STANDARDIZE, mentioned in Reeza's comment) but this works as a data step solution.
Here's an alternative requiring no sorting. It does require IDs to be sequential, though that can be worked around if they're not.
What it does is uses two set statements, one that gets the main (and previous) amounts, and one that sets until the next amount is found. Here I use the sequence of id variables to guarantee it will be the right record, but you could write this differently if needed (keeping track of what loop you're on) if the id variables aren't sequential or in an order of any sort.
I use the first.amount check to make sure we don't try to execute the second set statement more than we should (which would terminate early).
You need to do two things differently if you want first/last rows treated differently. Here I assume prev_amount is 0 if it's the first row, and I assume last_amount is missing, meaning the last row just gets the last prev_amount repeated, while the first row is averaged between 0 and the next_amount. You can treat either one differently if you choose, I don't know your data.
data have;
input Id Amount;
datalines;
1 10
2 .
3 20
4 30
5 .
6 .
7 40
;;;;
run;
data want;
set have;
by amount notsorted; *so we can tell if we have consecutive missings;
retain prev_amount; *next_amount is auto-retained;
if not missing(amount ) then prev_amount=amount;
else if _n_=1 then prev_amount=0; *or whatever you want to treat the first row as;
else if first.amount then do;
do until ((next_id > id and not missing(next_amount)) or (eof));
set have(rename=(id=next_id amount=next_amount)) end=eof;
end;
amount = mean(prev_amount,next_amount);
end;
else amount = mean(prev_amount,next_amount);
run;

use data step generate next observation

Case 1
Suppose the data are sorted by year then by month (always have 3 observations in data).
Year Month Index
2014 11 1.1
2014 12 1.5
2015 1 1.2
I need to copy the Index of last month to new observation
Year Month Index
2014 11 1.1
2014 12 1.5
2015 1 1.2
2015 2 1.2
Case 2
Year is removed from data. So we only have Month and Index.
Month Index
1 1.2
11 1.1
12 1.5
Data is always collected from consecutive 3 months. So 1 is the last month.
Still, ideal output is
Month Index
1 1.2
2 1.2
11 1.1
12 1.5
I solve it by creating another dataset only contains Month (1,2...12). Then right join the original dataset twice. But I think there's more elegant way to deal with this.
Case 1 can be a straight-forward data step. Add end=eof to the set statement to initialize a variable eof that returns value 1 when the data step is reading the last row of the data set. An output statement in the data step outputs a row during each iteration. If eof=1, a do block runs that increments the month by 1 and outputs another row.
data want;
set have end=eof;
output;
if eof then do;
month=mod(month+1,12);
output;
end;
run;
For case 2, I would switch to an sql solution. Self join the table to itself on month, incremented by 1 in the second table. Use the coalesce function to keep the values from the existing table if it exists. If not, use the values from the second table. Since a case crossing December-January will produce 5 months, limit the output to four rows using the outobs= option in proc sql to exclude the unwanted second January.
proc sql outobs=4;
create table want as
select
coalesce(t1.month,mod(t2.month+1,12)) as month,
coalesce(t1.index,t2.index) as index
from
have t1
full outer join have t2
on t1.month = t2.month+1
order by
coalesce(t1.month,t2.month+1)
;
quit;