Following is an example of the data I have
data testretain;
input SUBJ visit parameter value vistype$ basevalue$;
cards;
01 1 1 152 screen .
01 1 2 22 screen .
01 1 3 1000 screen .
01 2 1 154 random YES
01 2 2 23 random YES
01 2 3 1005 random YES
01 3 1 155 visit .
01 3 2 21 visit .
01 3 3 1003 visit .
;
run;
I want to make sure that the value if the basevalue is YESgets carried over to each visit
so that it looks like the following- This is how I want the output to look like
SUBJ visit parameter value vistype$ basevalue$ BASE;
01 1 1 152 screen .
01 1 2 22 screen .
01 1 3 1000 screen .
01 2 1 154 random YES 154
01 2 2 23 random YES 23
01 2 3 1005 random YES 1005
01 3 1 155 visit . 154
01 3 2 21 visit . 23
01 3 3 1003 visit . 1005
I tried the following code;
data testretain1;
set testretain;
if basevalue='YES' then BASE=value;
retain BASE;
run;
However it doesn't seem to work. The 1005 value gets dragged on to every observation.
Sort the data so that all the results for the same parameter are together then you can easily use RETAIN to solve this.
data want;
set have ;
by subject parameter visit;
if first.parameter then BASE=.;
if basevalue='YES' then BASE=value;
retain BASE;
run;
Related
I'm working on a panel dataset, which has missing values for four variables (at the start, end and in-between of panels). I would like to remove the entire panel which has missing values.
This is the code I have tried to use so far:
bysort BvD_ID YEAR: drop if sum(!missing(REV_LAY,EMP_LAY,FX_ASSET_LAY,MATCOST_LAY))==0
This piece of code successfully removes all observations with missing values in any of the four variables but it retains observations with non-missing values.
Example data:
Firm_ID Year REV_LAY EMP_LAY FX_ASSET_LAY
001 2001 80 25 120
001 2002 75 . 122
001 2003 82 32 128
002 2001 40 15 45
002 2002 42 18 48
002 2003 45 20 50
In the above sample data, I want to drop panel Firm_ID = 001 completely.
You can do something like:
clear
input Firm_ID Year REV_LAY EMP_LAY FX_ASSET_LAY
001 2001 80 25 120
001 2002 75 . 122
001 2003 82 32 128
002 2001 40 15 45
002 2002 42 18 48
002 2003 45 20 50
end
generate index = _n
bysort Firm_ID (index): generate todrop = sum(missing(REV_LAY, EMP_LAY, FX_ASSET_LAY))
by Firm_ID: drop if todrop[_N]
list Firm_ID Year REV_LAY EMP_LAY FX_ASSET_LAY
+-----------------------------------------------+
| Firm_ID Year REV_LAY EMP_LAY FX_ASS~Y |
|-----------------------------------------------|
1. | 2 2001 40 15 45 |
2. | 2 2002 42 18 48 |
3. | 2 2003 45 20 50 |
+-----------------------------------------------+
I need to figure out how to tabulate all possible combinations of data in a dataset. I have a dataset where each person has 2 rows, one row for an activity score and one row for a total score on a test. There are variables for the score at each visit. A person may have anywhere between 1 to 5 visits. I am looking for all possible combinations of the scores for a given person for each score.
For example, here is code to generate the sample data structure.
data example;
input name $ type $ visit1-visit5;
datalines;
Bob activity 10 13 16 . .
Bob total 13 19 17 . .
John activity 11 20 25 20 21
John total 13 15 17 19 22
Steve activity 6 . . . .
Steve total 9 . . . . .
;
run;
I would like to have a dataset that would give me a structure as follows:
Bob activity 10 13
Bob activity 10 16
Bob activity 13 16
Bob total 13 19
Bob total 13 17
Bob total 19 17
John (rows for all possible combinations)
Steve - would have no rows, since he only has one visit (no combinations possible)
Any suggestions?
For N choose 2 and the output structure you want a couple of nested DO's will suffice.
data example;
input name $ type $ visit1-visit5;
datalines;
Bob activity 10 13 16 . .
Bob total 13 19 17 . .
John activity 11 20 25 20 21
John total 13 15 17 19 22
Steve activity 6 . . . .
Steve total 9 . . . . .
;;;;
run;
data by2;
set example;
array v[*] visit:;
n=n(of v[*]);
do i = 1 to n;
col1 = v[i];
do j = i + 1 to n;
col2 = v[j];
output;
end;
end;
drop i j visit:;
run;
proc print;
run;
Help to solve the problem, please. I have some ideas, but none of them gives the desired result. DB have
Site Num Pres Began Start A B C
01 101 yes no yes 1 1 3
01 101 no yes yes 2 1 7
01 102 yes yes no 1 2 1
DB want (txt-file)
Site Num Pres Began Start Quantity
01 101 yes no yes 1
1
3
01 101 no yes yes 2
1
7
01 102 yes yes no 1
2
1
If you have any thoughts on this, I will be very grateful!!!
I consulted as I can configure SPOON for correct import of data, knowing that I have the data delimited by spaces.
And if it affects the import process in the penultimate record "SP_SEC" is not always a record and may be blank, affect the import?
I show the data as I have:
SP_NLE SP_LIB SP_DEP SP_PRV SP_DST SP_APP SP_APM SP_NOM SP_NAC SP_SEX SP_GRI SP_SEC SP_DOC
00000001 000090 70 03 04 BARDALES AHUANARI RENE 19111116 2 10 8
00000003 000001 25 01 01 MEZA DE RUIZ CARLOTA 19400119 2 20 1 1
00000004 000001 25 01 01 BARDALES TORRES JOYCE 19580122 2 20 9 1
00000005 244246 25 01 02 RAMIREZ RUIZ FRANCISCO 19600309 1 20 7 1
00000006 000001 25 01 01 SILVA RIVERA DE RIOS ALICIA 19570310 2 20 5 1
00000008 000001 25 01 01 PACAYA MANIHUARI MANUEL 19401215 1 10 1 1
00000009 233405 25 01 02 TORRES MUĂ‘OZ GLADYS 19650902 2 20 0 1
00000010 000508 25 01 01 OLIVOS RODAS BRITALDO 19510924 1 20 3 1
00000011 000001 25 01 01 ESCUDERO HERNANDEZ JULIA ISABEL 19351118 2 30 1
00000012 000001 25 01 01 YAICATE TARICUARIMA RICARDO 19560118 1 20 0 1
00000013 000001 25 01 01 ESPINOZA DE PINEDO ALEGRIA 19371108 2 10 1
00000014 000001 25 01 01 GARCIA PINCHI RICARDO 19650315 1 30 6 1
00000015 236352 09 01 01 LAO ESPINOZA ALINA 19601217 2 30 4 1
00000017 219532 25 01 01 YAICATE YAHUARCANI OLGA 19530706 2 10 1 1
Please aid, which must be placed in the section "Regular Expression" and the Content tab should be placed in the section of "Separator", or other value in any other section?
Any suggestions.
Currently I have two datasets with similar variable lists. Each dataset has a procedure variable. I want to compare the frequency of the procedure variable between datasets. I created a flag in both datasets to id the source dataset, and was going to merge but don't have a common identifier. How do I merge a dataset without deleting any observations? This isn't just a simple Merge without a By function, right?
Currently have:
Data.a Data.b
pproc proc1_numb
70 9
71 15
77 24
80 80
81 42
83 71
86 66
87 125
121 159
125 242
Want Output:
pproc freq
9 1
15 1
24 1
42 1
66 1
70 1
71 2
77 1
80 2
81 1
83 1
86 1
87 1
121 1
125 2
159 1
242 1
If I understand your question properly, you should just concatenate the two datasets into one and rename the variable. Then you can use PROC MEANS to get the frequencies. Something like this:
data all;
set a
b(rename=(proc1_numb=pproc));
run;
proc means nway data=all noprint;
class pproc;
output out=want(drop=_type_ rename=(_freq_=freq));
run;