SAS: How to replace delimit and split column to multiple rows? - sas

I have just started to learn SAS programming and I'm trying to experiment with replacing a phrase "####" with "|" before splitting the cell into multiple rows in SAS Studio.
I have created an example below for this experiment. It was reference from How to split a column into multiple rows in SAS but I couldn't get it to work. The SYSTEM_ID column is printing well but the ITEM_LIST is not splitting.
My current output is as follows:
Here's my current code. Please help.
data example1;
input SYSTEM_ID $ ITEM_LIST $ 5-50 ;
datalines;
ID_1 Apple Juice #### Orange #### Banana Milk
;
run;
data example2 (keep=SYSTEM_ID ITEM_LIST_SUB rename=(ITEM_LIST_SUB=ITEM_LIST));
set example1;
ITEM_LIST_TRANS = tranwrd(ITEM_LIST,"####","|");
do i = 1 to countw(ITEM_LIST_TRANS,"|");
ITEM_LIST_SUB = scan(ITEM_LIST,i,"|");
output;
end;
run;
proc print data = example2;run;

There are two small problems to your otherwise fine solution :-)
You reference ITEM_LIST in the Scan Function instead of ITEM_LIST_TRANS
In your example data, your data is indented, so ID_1 becomes part of ITEM_LIST.
See if this works for you
data example1;
input SYSTEM_ID $ ITEM_LIST $ 5-50 ;
datalines;
ID_1 Apple Juice #### Orange #### Banana Milk
;
run;
data example2 (keep=SYSTEM_ID ITEM_LIST_SUB rename=(ITEM_LIST_SUB=ITEM_LIST));
set example1;
ITEM_LIST_TRANS = tranwrd(ITEM_LIST,"####","|");
do i = 1 to countw(ITEM_LIST_TRANS,"|");
ITEM_LIST_SUB = scan(ITEM_LIST_TRANS,i,"|");
output;
end;
run;
proc print data = example2;run;
Result:
Obs SYSTEM_ID ITEM_LIST
1 ID_1 Apple Juice
2 ID_1 Orange
3 ID_1 Banana Milk

DLMSTR will allow direct read.
data example1;
infile cards dlmstr='####' missover;
input SYSTEM_ID $ #;
length item $50;
do until(missing(item));
input item #;
if not missing(item) then output;
end;
datalines;
ID_1 Apple Juice #### Orange #### Banana Milk
ID_2 Apple Juice #### Orange #### Banana Milk #### Apple Juice #### Orange #### Banana Milk
;
run;
proc print;
run;

Related

How do I avoid spaces/tabs in columns names when I use proc transpose?

How do I avoid spaces/tabs in columns names when I use proc transpose? The best way to illustrate my problem is by giving an example:
Data tst; input ColA $ ColB; datalines;
Cat1 1
Cat2 2
Cat3 3
; run;
proc transpose data = tst out= tst_out (drop = _name_); id ColA;
run;
When running this code my column names look something like this:
Basically I want the column names to be "Cat1", "Cat1", "Cat1" and not " Cat1", " Cat1", " Cat1".
(If that is not possible then I have an alternative question: How do I remove the spaces AFTER proc transpose? In my real data set I have a lot of columns so I prefer a method where I don't have to type for every column)
Just change the setting of VALIDVARNAME option to V7 instead of ANY. It won't remove the leading spaces/tabs but it will change them to underscores so the result are valid names.
Example:
data tst;
input ColA $& ColB;
datalines;
Cat 1 1
Cat 2 2
Cat 3 3
;
options validvarname=v7;
proc transpose data=tst out=tst2; id cola ; var colb; run;
proc print;
run;
Result:
Obs _NAME_ Cat_1 Cat_2 Cat_3
1 ColB 1 2 3
PS When using in-line data in your SAS program make sure to start the lines of data in the first column. That will prevent the accidental inclusion of spaces (or tabs when using SAS/Studio interface) in the lines of data. Placing the DATALINES (also known as CARDS) statement starting in the first column will also prevent the editor from automatically indenting when you start adding lines of data.

Data step issue in sas enterprise guide

I need to write data step query in sas where i need to give sequence numbers to a column starting from a particular number.
For example right now my table looks like this:
Column 1 Column 2
abc book1
xyz book2
zex book3
I want my table to look like this:
Column 1 Column 2 Column3
abc book1 151
xyz book2 152
zex book3 153
How to add Column 3 with a sequence number staring from a particular number?
How about this
data have;
input Column1 $ Column2 $;
datalines;
abc book1
xyz book2
zex book3
;
data want;
do Column3 = 150 by 1 until (lr);
set have end=lr;
output;
end;
run;

Changing a SAS character variable into a SAS numerical variable?

I have created the following SAS table:
DATA test;
INPUT name$ Group_Number;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2;
run;
I would like to change group number from a character type into a numeric type.
Here is my attempt:
data test2;
set test;
Group_Number1 = input(Group_Number, best5.);
run;
The problem is that when I execute:
proc contents data = test2;
run;
The output table shows that group number is still of a character type. I think that the problem may be that I have "best5." in my input statement. However I am not 100% sure what is wrong.
How can I fix the solution?
If you have a character variable your code will work. But you don't, you have a numeric variable in your sample data. So either your fake data is incorrect, or you don't have the problem you think you do.
Here's an example that you can run to see this.
*read group_number as numeric;
DATA test_num;
INPUT name$ Group_Number;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2
;
run;
Title 'Group_Number is Numeric!';
proc contents data=test;
run;
*read group_number as character;
DATA test_char;
INPUT name$ Group_Number $;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2
;
run;
data test_converted;
set test_char;
group_number_num = input(group_number, 8.);
run;
Title 'Group_Number is Character, Group_Number1 is Numeric';
proc contents data=test_converted;
run;
try this:
data test2;
set test;
Group_Number1 = input(put(Group_Number,best5.),best5.);
run;

How to make dataset, where there last variables will be in one column

I have a dataset:
1 300 apple pear onion
1 302 banana tomato cookie
2 302 bread meat tomato
How to make dataset, where there last variables will be in one column.
What I need:
Dataset
You need to look at the CATX function (or its siblings, CATS, CATT, CATQ, CAT).
new_var = catx(var1,var2,var3)
Or a couple of other options:
new_var = catx(of var:);
new_var = catx(of var1-var3);
If they're all starting with the same pattern.
Use proc transpose with your categories in the by statement and the variables to transpose in the var statement:
data have;
input var1 var2 var3 $ var4 $ var5 $;
datalines;
1 300 apple pear onion
1 302 banana tomato cookie
2 302 bread meat tomato
;
run;
proc transpose data=have out=want (drop=_name_ rename=(col1 = fruit));
by var1 var2;
var var3 var4 var5;
run;

How to match data in SAS

I have a dataset which contain three variables var1, var2, and Price. Price is the price of var2. var1 is a subsample of of Var2. Now, I want to find the price of each product in var1 by matching the name of Var1 with Var2.
The data looks like this. Can anyone help me solve this out please. Many thanks
Var1 Var2 Price
apple ?
apple 2
banana ?
banana 2.1
apple ?
orange ?
orange 4
banana ?
yoghurt 2
You could do this through SQL by merging your prices onto your dataset by var1/var2:
proc sql ;
create table output as
select a.var1, a.var2, b.price
from input a
left join (select distinct var2, price
from input
where not missing(var2)) as b
on (a.var1=b.var2
or a.var2=b.var2)
;quit ;
Try to use hash table.
data want;
if 0 then set have(keep=var2 price where=(not missing(var2)));
if _n_=1 then do;
declare hash h (dataset:'have1(keep=var2 price where=(not missing(var2)))');
h.definekey('var2');
h.definedata('price');
h.definedone();
call missing(var2,price);
end;
set have;
rc=h.find(key:var1);
drop rc;
run;