Merge and update one file based on another one - sas

suppose to have two data sets (files). File 1 is composed by time-periods with a label for each one and File2 that contains sub-periods without labels. I need to add labels to File2 based on the time interval from File1 so that if the period has Label "x" and the sub-period is contained in the period of File1, the sub-period will take the label from the period of File1.
Can anyone help me please?
data have1;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;
File2:
data have2;
input ID :$20. Start :date9. End :date9.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015
0001 31JAN2015 15FEB2015
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
......
;
File3 desired output:
data output;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015 HospitalA ex005
0001 31JAN2015 15FEB2015 HospitalA ex005
0001 15FEB2015 30APR2015 HospitalA ex005
0001 01MAY2015 15MAY2015 HospitalA ex004
0001 16MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 15SEP2015 HospitalC ex005
0001 16SEP2015 31DEC2015 HospitalC ex005
......
;

Try this
data have1;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;
data have2;
input ID :$20. Start :date9. End :date9.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015
0001 31JAN2015 15FEB2015
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
;
data want(drop = s e);
if _N_ = 1 then do;
dcl hash h(dataset : 'have1(rename = (Start = s End = e)', multidata : 'Y');
h.definekey('ID');
h.definedata('s', 'e', 'Label', 'Role');
h.definedone();
dcl hiter i('h');
end;
set have2;
if 0 then set have1(rename = (Start = s End = e));
call missing(s, e, Label, Role);
do while (i.next() = 0);
if Start >= s and End <= e then leave;
else call missing(Label, Role);
end;
run;
Result:
ID Start End Label Role
0001 2015-01-01 2015-01-30 HospitalA ex005
0001 2015-01-31 2015-02-15
0001 2015-02-15 2015-04-30 HospitalA ex005
0001 2015-05-01 2015-05-15 HospitalA ex004
0001 2015-05-16 2015-05-31
0001 2015-06-01 2015-09-15 HospitalC ex005
0001 2015-09-16 2015-12-31

Related

Assign labels based on multiple conditions

suppose to have the following:
ID Start_date End_date Hospital Work
00001 01JAN2015 15JAN2015 006 w
00001 16JAN2015 16JAN2015 006 p
00001 17JAN2015 20JAN2015 006 w
00001 21JAN2015 29JAN2015 006 f
00001 30JAN2015 02FEB2015 004 w
00001 03FEB2015 03FEB2015 004 s
00001 04FEB2015 08FEB2015 004 w
00001 09FEB2015 13FEB2015 004 f
00001 14FEB2015 16FEB2015 006 f
00001 17FEB2015 28DEC2016 006 w
00001 29DEC2016 31DEC2016 006 w
.... ..... ...... ... ...
Desired output:
ID Start_date End_date Hospital Work Flag1 Flag2
00001 01JAN2015 15JAN2015 006 w 1 4
00001 16JAN2015 16JAN2015 006 p 4 9
00001 17JAN2015 20JAN2015 006 w 9 4
00001 21JAN2015 29JAN2015 006 f 4 9
00001 30JAN2015 02FEB2015 004 w 9 2
00001 03FEB2015 03FEB2015 004 s 2 9
00001 04FEB2015 08FEB2015 004 w 9 4
00001 09FEB2015 13FEB2015 004 f 4 9
00001 14FEB2015 16FEB2015 006 f 9 2
00001 17FEB2015 28DEC2016 006 w 2 4
00001 29DEC2016 31DEC2016 006 w 4 Stop
.... ..... ...... ... ...
in other words I need to add two columns: Flag1 and Flag2 containing indices with the following criteria:
if the the first Start_date for the ID then Flag1 must always be 1. Then flag2 will contain four indices as follows: 4 if "w" in Work column, 9 if not "w" in Work column (f, s or other), 2 if Hospital changes (here from 006 to 004 and then 006 again) and Stop for the end of the period, here 31DEC2016 but it could be 31DEC2019 or 31DEC2020 depending on the ID. Totally I have 350 IDs that are repeated because I have many periods per ID.
Column Flag1 will take the previous index of Flag2 column.
Can anyone help me please?Thank you in advance
data source_data;
input ID :$5. Start_date :date9. End_flag :date9. Hospital :$3. Work :$1.;
format Start_date End_flag date9.;
datalines;
00001 01JAN2015 15JAN2015 006 w
00001 16JAN2015 16JAN2015 006 p
00001 17JAN2015 20JAN2015 006 w
00001 21JAN2015 29JAN2015 006 f
00001 30JAN2015 02FEB2015 004 w
00001 03FEB2015 03FEB2015 004 s
00001 04FEB2015 08FEB2015 004 w
00001 09FEB2015 13FEB2015 004 f
00001 14FEB2015 16FEB2015 006 f
00001 17FEB2015 28DEC2016 006 w
00001 29DEC2016 31DEC2016 006 w
;
proc sort data=source_data;
by ID start_date hospital;
run;
data destination_data;
retain ID Start_date End_flag Hospital Work Flag1 Flag2;
attrib Flag1 length=$8 Flag2 length=$8;
set source_data;
by id start_date hospital;
retain Flag2R;
if work='w' then Flag2='4';
else Flag2='9';
if not first.ID and lag(hospital) NE hospital then Flag2='2';
if last.ID then Flag2='Stop';
Flag2R=lag(Flag2);
if first.ID then flag1='1';
else flag1=Flag2R;
drop Flag2R;
run;
proc print data=destination_data noobs;
run;

Run a code while stratifying by two variables

suppose to have the following simple case (only for explanatory purposes. Original data are more complicated to show):
data have;
input ID :$20. Label1 :$20. Label2 :$20. Hours :$20.;
cards;
0001 rep1 w 345
0001 rep1 f 985
0001 rep1 w 367
0001 rep2 w 65
0001 rep2 w 123
0001 rep2 f 120
0002 rep6 f 45
0002 rep6 w 657
0002 rep6 w 45
0002 rep1 w 567
0002 rep1 f 78
0002 rep1 w 9
..... .... ... ...
;
I would like to sum, foreach ID the hours corresponding to "w" but also stratifying by Label1, i.e. rep*. I used:
data want;
set have;
by ID Label1;
if first.ID
..........
if last.ID
.........
run;
Although I was able to stratify by ID I was not able to stratify by Label1.
Is it possible to write as follows: if first.ID and first.Label1 then....?
While doing some attempts, SAS gave me also the following error:
"by variables are not properly sorted on data set have". Input data are sorted by ID.
Thank you in advance
Obviously and as you said, the input data is sorted by ID, so you can use first.ID. But the data is not sorted by label, therefore you cannot use first.label. If you want to use both you have to sort by both variables:
proc sort data=have;
by ID label;
quit;
But keep in mind that in your sample data there will then be not only one first.label=1 for label=rep1, but twice:
first.ID first.label
0001 rep1 w 345 1 1
0001 rep1 f 985 0 0
0001 rep1 w 367 0 0
0001 rep2 w 65 0 1
0001 rep2 w 123 0 0
0001 rep2 f 120 0 0
0002 rep1 w 567 1 1
0002 rep1 f 78 0 0
0002 rep1 w 9 0 0
0002 rep6 f 45 0 1
0002 rep6 w 657 0 0
0002 rep6 w 45 0 0

Loop over time periods

suppose to have the following data set:
ID Date_Start Date_End Flag1 Flag2
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 0
002 01JAN2015 31DEC2020 1 0
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 0
004 01JAN2011 31DEC2021 1 2
..... ......... ......... ..... ......
Desired output:
ID Date_Start Date_End Flag1 Flag2
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 10
002 01JAN2015 31DEC2020 1 10
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 10
004 01JAN2011 31DEC2021 1 2
..... ......... ......... ..... ......
In other words: if Flag2 == 0 and Flag1 == 1 replace the flag in Flag2 column with 10 for each ID as follows:
for replicated IDs take the last interval of time;
for unique IDs take the interval you have.
I'm a newbie in SAS programming. I know that what I have to do is:
data my data;
set input;
if Flag2 = 0 AND Flag1 = 1 then Flag2 = 10
run;
but I don't know how to manage periods and replicated IDs. Can anyone help me please?
I'm not entirely sure here, but I think this is what you want.
data have;
input ID $ (Date_Start Date_End)(:date9.) Flag1 Flag2;
format Date_Start Date_End date9.;
datalines;
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 0
002 01JAN2015 31DEC2020 1 0
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 0
004 01JAN2011 31DEC2021 1 2
;
data want;
set have;
by ID;
if last.ID and flag1 = 1 and flag2 = 0 then flag2 = 10;
run;
Result
ID Date_Start Date_End Flag1 Flag2
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 10
002 01JAN2015 31DEC2020 1 10
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 10
004 01JAN2011 31DEC2021 1 2

What is wrong with my code to get correct GRADIENT?

I have problem with rounding results. I have gradient matrix, my results are almost the same as assumed. ALMOST.
I try to use FLOOR/ROUND/CEIL but it improve nothing.
uint16_t t1=0,t2=0,t3=0;
float a,b,c;
uint16_t value=0;
for (int j = 0; j < count; j++) {
t1 = ceil(rgbLeft->r +(floor((rgbRight->r - rgbLeft->r) * j) / (count-1)));
t2 = ceil(rgbLeft->g +(floor((rgbRight->g - rgbLeft->g) * j) / (count-1)));
t3 = ceil(rgbLeft->b + (floor(rgbRight->b - rgbLeft->b) * j) / (count-1));
value = (t1 << 11) | (t2 << 5) | (t3);
vec.push_back(value);
}
INPUT
count is always 16. This is sizeof matrix 16x8. I put the struct like :
typedef struct{
unsigned int r:5;
unsigned int g:6;
unsigned int b:5;
}RGB;
rgbLeft AND rgbRight is RGB struct.
The inputs for example is 1 and 16. And i get line :
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010
INPUT FOR THIS example is 1 16 16 1. (start top, end top, start bottom, end bottom).
First I make struct rgbLeft and rgbRight from input (1,16,16,1) (val is start top (rgbLeft), end stop(rgbRight)
rgb->r = (val >> 11);
rgb->g = (val >> 5);
rgb->b = (val);
after that i use
uint16_t t1=0,t2=0,t3=0;
float a,b,c;
uint16_t value=0;
for (int j = 0; j < count; j++) {
t1 = ceil(rgbLeft->r +(floor((rgbRight->r - rgbLeft->r) * j) / (count-1)));
t2 = ceil(rgbLeft->g +(floor((rgbRight->g - rgbLeft->g) * j) / (count-1)));
t3 = ceil(rgbLeft->b + (floor(rgbRight->b - rgbLeft->b) * j) / (count-1));
value = (t1 << 11) | (t2 << 5) | (t3);
vec.push_back(value);
}
Finally I get :
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010
0002 0002 0003 0004 0005 0006 0006 0007 0008 0009 000A 000A 000B 000C 000D 000E
0004 0004 0005 0005 0006 0006 0007 0007 0008 0008 0009 0009 000A 000A 000B 000C
0006 0006 0006 0006 0007 0007 0007 0007 0008 0008 0008 0008 0009 0009 0009 000A
0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008
000A 0009 0009 0009 0008 0008 0008 0008 0007 0007 0007 0007 0006 0006 0006 0006
000C 000B 000A 000A 0009 0009 0008 0008 0007 0007 0006 0006 0005 0005 0004 0004
000E 000D 000C 000B 000A 000A 0009 0008 0007 0006 0006 0005 0004 0003 0002 0002
0010 000F 000E 000D 000C 000B 000A 0009 0008 0007 0006 0005 0004 0003 0002 0001
But i think (??) i must have something like this:
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010
0002 0003 0004 0005 0005 0006 0007 0008 0008 0009 000A 000B 000B 000C 000D 000E
0004 0005 0005 0006 0006 0007 0007 0008 0008 0009 0009 000A 000A 000B 000B 000C
0006 0006 0007 0007 0007 0007 0008 0008 0008 0008 0009 0009 0009 0009 000A 000A
0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008
000A 000A 0009 0009 0009 0009 0008 0008 0008 0008 0007 0007 0007 0007 0006 0006
000C 000B 000B 000A 000A 0009 0009 0008 0008 0007 0007 0006 0006 0005 0005 0004
000E 000D 000C 000B 000B 000A 0009 0008 0008 0007 0006 0005 0005 0004 0003 0002
0010 000F 000E 000D 000C 000B 000A 0009 0008 0007 0006 0005 0004 0003 0002 0001
for example input 0 0 3200 1800 is ok...
HERE IS EXAMPLE CODE TO TEST EACH LINE :
https://wandbox.org/permlink/bdkxrjxYtq6LHhMg
EDIT.
I change a little bit code :
uint8_t t1=0,t2=0,t3=0;
int a=0,b=0,c=0;
uint16_t value=0;
for (int j = 0; j < count; j++) {
t1 =round(rgbLeft->r + (((floor(rgbRight->r - rgbLeft->r) * j) / (count-1))));
t2 =round(rgbLeft->g + (((floor(rgbRight->g - rgbLeft->g) * j) / (count-1))));
t3 =round(rgbLeft->b + (((floor(rgbRight->b - rgbLeft->b) * j) / (count-1))));
value = (t1 << 11) | (t2 << 5) | (t3);
vec.push_back(value);
}
and i get
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010
0002 0003 0004 0004 0005 0006 0007 0008 0008 0009 000A 000B 000C 000C 000D 000E
0004 0005 0005 0006 0006 0007 0007 0008 0008 0009 0009 000A 000A 000B 000B 000C
0006 0006 0007 0007 0007 0007 0008 0008 0008 0008 0009 0009 0009 0009 000A 000A
0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008 0008
000A 000A 0009 0009 0009 0009 0008 0008 0008 0008 0007 0007 0007 0007 0006 0006
000C 000B 000B 000A 000A 0009 0009 0008 0008 0007 0007 0006 0006 0005 0005 0004
000E 000D 000C 000C 000B 000A 0009 0008 0008 0007 0006 0005 0004 0004 0003 0002
0010 000F 000E 000D 000C 000B 000A 0009 0008 0007 0006 0005 0004 0003 0002 0001
there are ONLY 4 POINTS DIFFRENT FROM ASSUMED !
EDIT :
tl:1
tr 16
bl 16
br 1
RGB RGBtl, RGBtr, RGBbl, RGBbr;
std::vector<uint16_t> firstColumn,lastColumn;
valueToColorRGB(&RGBtl,tl);
valueToColorRGB(&RGBtr,tr);
valueToColorRGB(&RGBbl,bl);
valueToColorRGB(&RGBbr,br);
colorRGBtoVector(firstColumn,&RGBtl,&RGBbl,size.height);
colorRGBtoVector(lastColumn,&RGBtr,&RGBbr,size.height);
for (int j = 0; j < size.height; j++) {
tl = firstColumn[j];
tr = lastColumn[j];
valueToColorRGB(&RGBtl,tl);
valueToColorRGB(&RGBtr,tr);
colorRGBtoVector2(vec3,&RGBtl,&RGBtr,size.width);
}
valueToColorRGB(RGB *rgb, const uint16_t &val){
rgb->r = (val >> 11);
rgb->g = (val >> 5);
rgb->b = (val);
}
colorRGBtoVector2(std::vector<uint16_t> &vec, RGB *rgbLeft, RGB *rgbRight, const uint16_t &count){
uint8_t t1=0,t2=0,t3=0;
int a=0,b=0,c=0;
uint16_t value=0;
for (int j = 0; j < count; j++) {
t1 =(rgbLeft->r + ((floor(rgbRight->r - rgbLeft->r) * j) / (count-1)));
t2 =(rgbLeft->g + ((floor(rgbRight->g - rgbLeft->g) * j) / (count-1)));
t3 =(rgbLeft->b + ((floor(rgbRight->b - rgbLeft->b) * j) / (count-1)));
value = (t1 << 11) | (t2 << 5) | (t3);
vec.push_back(value);
}
}
Your actual question seems to boil down to:
idk why (14-2) * 1 / 15 + 2 = 2
This is taken from deriving values from this you would have:
rgbRight: 14U
rgbLeft: 2U
count: 16
j: 1
Your equation in the question: t3 = ceil(rgbLeft->b + (floor(rgbRight->b - rgbLeft->b) * j) / (count-1)) would play out in the following order:
rgbRight->b - rgbLeft->b: 12U
floor(rgbRight->b - rgbLeft->b): 12.0
floor(rgbRight->b - rgbLeft->b) * j: 12.0
count - 1: 15
(floor(rgbRight->b - rgbLeft->b) * j) / (count-1): 0.8
rgbLeft->b + (floor(rgbRight->b - rgbLeft->b) * j) / (count-1): 2.8
ceil(rgbLeft->b + (floor(rgbRight->b - rgbLeft->b) * j) / (count-1)): 3.0
Finally the assignment back to t3 would cast, with an ignored warning, back to 3U not 2U.
I believe what is confusing you is that your linked tester doesn't use doubles: (unsigned int)start.b + (((stop.b - start.b) * i) / (elems - 1)) which will play out in the following order:
(unsigned int)start.b : 2U
stop.b - start.b: 12U
(stop.b - start.b) * i: 12
elems - 1: 15
((stop.b - start.b) * i) / (elems - 1): 0
(unsigned int)start.b + (((stop.b - start.b) * i) / (elems - 1)): 2U
It's key to note 5 in this list performs integer division and thus returns a 0 while 5 from the previous list performs floating point division and returns 0.8.

Adding digits in list in python

I have sentence like "Q 000 1111 00001 0001 00 //SOME_STRING" I wanted to add except Q and //SOME_STRING in a List in Python in the Result only List contains 000 1111 00001 0001 00.
How can I do this?
import re
data = "Q 000 1111 00001 0001 00 //SOME_STRING"
digits = re.findall(r"\b\d+\b",data)
Test
>>> re.findall(r"\b\d+\b","Q 000 1111 00001 0001 00 //SOME_STRING234zzzz")
['000', '1111', '00001', '0001', '00']
import re
def filter_digits(bar):
return re.search("^\d+$", bar)
foo = "Q 000 1111 00001 0001 00 //SOME_STRING"
foo = foo.split(' ')
foo = filter(filter_digits, foo)