How to split a combination of numeric and characters into multiple columns

How to split a combination of numeric and characters into multiple columns - sas

I want to split some variable "15to16" into two columns where for that row I want the values 15 and 16 in each of the column entries. Hence, I want to get from this
+-------------+
| change |
+-------------+
| 15to16 |
| 9to8 |
| 6to5 |
| 10to16 |
+-------------+
this
+-------------+-----------+-----------+
| change | from | to |
+-------------+-----------+-----------+
| 15to16 | 15 | 16 |
| 9to8 | 9 | 8 |
| 6to5 | 6 | 5 |
| 10to16 | 10 | 16 |
+-------------+-----------+-----------+
Could someone help me out? Thanks in advance!

data have;
input change $;
cards;
15to16
9to8
6to5
10to16
;
run;
data want;
set have;
from = input(scan(change,1,'to'), 8.);
to = input(scan(change,2,'to'), 8.);
run;
N.B. in this case the scan function is using both t and o as separate delimiters, rather than looking for the word to. This approach still works because scan by default treats multiple consecutive delimiters as a single delimiter.

Regular expressions with the metacharacter () define groups whose contents can be retrieved from capture buffers with PRXPOSN. The capture buffers retrieved in this case would be one or more consecutive decimals (\d+) and converted to a numeric value with INPUT
data have;
input change $20.; datalines;
15to16
9to8
6to5
10to16
run;
data want;
set have;
rx = prxparse('/^\s*(\d+)\s*to\s*(\d+)\s*$/');
if prxmatch (rx, change) then do;
from = input(prxposn(rx,1,change), 12.);
to = input(prxposn(rx,2,change), 12.);
end;
drop rx;
run;

You can get the answer you want by declaring delimiter when you create the dataset. However you did not provide enough information regarding your other variables and how you import them
Data want;
INFILE datalines DELIMITER='to';
INPUT from to;
datalines;
15to16
9to8
6to5
10to16
;
Run;

Related

Invert two strings inside brackets

I have a coordinates column that refers to an ID with 3 dimensions [z][x][y]. What I am trying to do is to invert the [x] dimension with the [y] so that it becomes a table with (still) 3 dimensions but ordered differently: [z][y][x]
Here is a snippet of the column:
+------------------------+
| COORDINATES |
+------------------------+
| ID_01.02.a[][01][20] |
| ID_02.00[010][02][017] |
| ID_03.01[][][010] |
| ID_04.01.w[010][][] |
+------------------------+
Here is the code to reproduce this snippet:
data have;
input coordinates :$30.;
datalines;
ID_01.02.a[][01][20]
ID_02.00[010][02][017]
ID_03.01[][][010]
ID_04.01.w[010][][]
;
The final table is expected to be:
+------------------------+
| COORDINATES |
+------------------------+
| ID_01.02.a[][20][01] |
| ID_02.00[010][017][02] |
| ID_03.01[][010][] |
| ID_04.01.w[010][][] |
+------------------------+
When I tried using the scan function, it did not work because I could not find a robust index e.g. x_after = trim(scan(coordinates,2,'[]')). Indeed, if the z axis is not empty it will output the z axis and if the z axis is empty, it will output the x axis.

You can do it with SCAN(). Use [] as the delimiters and the m modifier. But now your strings have 6 values and not 4 as there are empty values between the places with ][ .
Here is a way that shows what is happening.
data have;
input COORDINATES $50.;
cards;
ID_01.02.a[][01][20]
ID_02.00[010][02][017]
ID_03.01[][][010]
ID_04.01.w[010][][]
;
data want;
set have;
length want $50;
array temp[6] $50;
do i=1 to dim(temp);
temp[i]=scan(coordinates,i,'[]','m');
end;
want = cats(temp[1],'[',temp[2],'][',temp[6],'][',temp[4],']');
drop i;
run;
Results:
Obs COORDINATES want temp1 temp2 temp3 temp4 temp5 temp6
1 ID_01.02.a[][01][20] ID_01.02.a[][20][01] ID_01.02.a 01 20
2 ID_02.00[010][02][017] ID_02.00[010][017][02] ID_02.00 010 02 017
3 ID_03.01[][][010] ID_03.01[][010][] ID_03.01 010
4 ID_04.01.w[010][][] ID_04.01.w[010][][] ID_04.01.w 010

You can access regular expression power via PRXCHANGE to modify values containing both simple and complex patterns.
Example:
data have;
input COORDINATES $char50.;
datalines;
ID_01.02.a[][01][20]
ID_02.00[010][02][017]
ID_03.01[][][010]
ID_04.01.w[010][][]
;
data want;
set have;
put coordinates;
coordinates = prxchange
( 's/(.*?\[.*?\])(\[.*?\])(\[.*\])/$1$3$2/'
, 1
, coordinates
);
put coordinates /;
run;
Logs
ID_01.02.a[][01][20]
ID_01.02.a[][20][01]
ID_02.00[010][02][017]
ID_02.00[010][017][02]
ID_03.01[][][010]
ID_03.01[][010][]
ID_04.01.w[010][][]
ID_04.01.w[010][][]

Grouping child items and displaying parent sum

I have the following table
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
I would like to group the table by group, insert the grouped sum into value, and then ungroup:
+-------+--------+
| item | value |
+-------+--------+
| 1 | 30 |
| a | 10 |
| b | 20 |
| 2 | 70 |
| b | 30 |
| c | 40 |
+-------+--------+
The purpose of the result is to interpret the first column as items a and b belonging to group 1 with sum 30 and items b and c belonging to group 2 with sum 70.

Such a data transformation can be indicative of a reporting requirement more than a useful data structure for downstream processing. Proc REPORT can create output in the form desired.
data have;
infile datalines;
input group $ item $ value ##; datalines;
1 a 10 1 b 20 2 b 30 2 c 40
;
proc report data=have;
column group item value;
define group / order order=data noprint;
break before group / summarize;
compute item;
if missing(item) then item=group;
endcomp;
run;

I assume that both group and item are character variables
data have;
infile datalines firstobs=4 dlm='|';
input group $ item $ value;
datalines;
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
;
data want (keep=group value);
do _N_=1 by 1 until (last.group);
set have;
by group;
v + value;
end;
value = v;output;v=0;
do _N_=1 to _N_;
set have;
group = item;
output;
end;
run;

How can I add observations to the existing dataset based on dates?

I have a dataset like this:
data have;
input date :date9. index;
format date date9.;
datalines;
31MAR2019 10
30APR2019 12
31MAY2019 15
30JUN2019 14
;
run;
I would like to add observations with dates from the maximum date (hence from 30JUN2019) until 31DEC2019 (by months) with the value of index being the last available value: 14. How can I achieve this in SAS? I want the code to be flexible, thus for every such dataset, take the maximum of date and add monthly observations from that maximum until DEC2019 with the value of index being equal to the last available value (here in the example the value in JUN2019).

An explicit DO loop over the SET provides the foundation for a concise solution with no extraneous worker variables. Automatic variable last is automatically dropped.
data have;
input date :date9. index;
format date date9.;
datalines;
31MAR2019 10
30APR2019 12
31MAY2019 15
30JUN2019 14
;
data want;
do until (last);
set have end=last;
output;
end;
do last = month(date) to 11; %* repurpose automatic variable last as a loop index;
date = intnx ('month',date,1,'e');
output;
end;
run;
Always helpful to refresh understanding. From SET Options documentation
END=variable
creates and names a temporary variable that contains an end-of-file indicator. The variable, which is initialized to zero, is set to 1 when SET reads the last observation of the last data set listed. This variable is not added to any new data set.

You can do it using end in set statement and retain statement.
data want(drop=i tIndex tDate);
set have end=eof;
retain tIndex tDate;
if eof then do;
tIndex=Index;
tDate=Date;
end;
output;
if eof then do;
do i=1 to 12-month(tDate);
index=tIndex;
date = intnx('month',tDate,i,'e');
output;
end;
end;
run;
INPUT:
+-----------+-------+
| date | index |
+-----------+-------+
| 31MAR2019 | 10 |
| 30APR2019 | 12 |
| 31MAY2019 | 15 |
| 30JUN2019 | 14 |
+-----------+-------+
OUTPUT:
+-----------+-------+
| date | index |
+-----------+-------+
| 31MAR2019 | 10 |
| 30APR2019 | 12 |
| 31MAY2019 | 15 |
| 30JUN2019 | 14 |
| 31JUL2019 | 14 |
| 31AUG2019 | 14 |
| 30SEP2019 | 14 |
| 31OCT2019 | 14 |
| 30NOV2019 | 14 |
| 31DEC2019 | 14 |
+-----------+-------+

adding rows given a certain condition

I have a database with 3 columns. ID, Date and amount. It is ordered by ID and Date. All I want to do is to add a row after the latest occurrence of every ID with the same ID, Date = Date + 1 Month and Amount = 0.
As an Illustration I want to go from this:
id | Date |amount |
A | 01JAN| 1 |
A | 01FEB| 1 |
B | 01FEB| 0 |
B | 01MAR| 1 |
to this:
id | Date |amount |
A | 01JAN| 1 |
A | 01FEB| 1 |
A | 01MAR| 0 | <- ADD THIS ROW
B | 01FEB| 0 |
B | 01MAR| 1 |
B | 01APR| 0 |<- ADD THIS ROW
I know I should use intxn but beyond that I don't really know what to do. I appreciate any input.

Assuming that the DATE variable has actual date values in it you just need to output twice on the last observation in each group.
data want;
set have;
by id;
output;
if last.id then do;
date=intnx('month',date,1,'b');
amount=0;
output;
end;
run;

Compare Value of Current Observation with First Observation

I have a set of multiple choice responses from a survey with 45 questions, and I've placed the correct responses as my first observation in the dataset.
In my DATA step I would like to set values to 0 or 1depending on whether the variable in each observation matches the same variable in the first observation, I want to replace the response letter (A-D) with the 0 or 1 in the dataset, how do I go about doing that comparison?
I'm not doing any grouping, so I believe I can access the first row using First.x, but I'm not sure how to compare that across each variable(answer1-answer45).
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | A | C |
| 3 | C | D |
| 4 | A | B |
| 5 | D | C |
| 6 | B | B |
Should become:
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | 1 | 0 |
| 3 | 0 | 0 |
| 4 | 1 | 1 |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
Current code for reading in the data:
DATA TEST(drop=name fill answer0);
INFILE SCORES DSD firstobs=2;
length id $4;
length answer1-answer150 $1;
INPUT name $ fill id $ (answer0-answer150) ($);
RUN;
Thanks in advance!

Here's how I might do it. Create a data set to PROC COMPARE the KEY to the observed. Then you have X for not matching key and missing for matched. You can then use PROC TRANSREG to score the 'X.' to 01. PROC TRANSREG also creates macro variables which contain the names of the new variables and the number.
From log NOTE: _TRGINDN=2 _TRGIND=answer1D answer2D
data questions;
input id:$3. (answer1-answer2)(:$1.);
cards;
KEY A B
2 A C
3 C D
4 A B
5 D C
6 B B
;;;;
run;
data key;
if _n_ eq 1 then set questions(obs=1);
set questions(keep=id firstobs=2);
run;
proc compare base=key compare=questions(firstobs=2) out=comp outdiff noprint;
id id;
run;
options validvarname=v7;
proc transreg design data=comp(drop=_type_ type=data);
id id;
model class(answer:) / noint;
output out=scored(drop=intercept _:);
run;
%put NOTE: &=_TRGINDN &=_TRGIND;

I don't have my SAS license here at home, so I can't actually test this code. I'll give it me best shot, though ...
First, I'd keep my correct answers in a separate table, and then merge it with the answers from the respondents. That also makes the solution scalable, should you have more multiple choice solutions and answers in the same table, since you'd be joining on the assignment ID as well.
Now, import all your correct answers to a table answers_correct with column names answer_correct1-answer_correct45.
Then, merge the two tables and determine the outcome for each question.
DATA outcome;
MERGE answers answers_correct;
* We will not be using any BY.;
* If you later add more questionnaires, merge BY the questionnaire ID;
ARRAY answer(*) answer1-answer45;
ARRAY answer_correct(*) answer_correct1-answer_correct45;
LENGTH result1-result45 $1;
ARRAY result(*) result1-result45;
DROP i;
FOR i = 1 TO DIM(answer);
IF answer(i) = answer_correct(i) THEN result(i) = '1';
ELSE result(i) = '0';
END;
RUN;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to split a combination of numeric and characters into multiple columns - sas

You can get the answer you want by declaring delimiter when you create the dataset. However you did not provide enough information regarding your other variables and how you import them Data want; INFILE datalines DELIMITER='to'; INPUT from to; datalines; 15to16 9to8 6to5 10to16 ; Run;

Related

Invert two strings inside brackets

Grouping child items and displaying parent sum

How can I add observations to the existing dataset based on dates?

adding rows given a certain condition

Compare Value of Current Observation with First Observation

Categories

Resources