I have dataset with two variables 'shift' and 'scheduled'. The 'shift' variable contains a number of different time value records, for example "ED A 7a-4p"; the scheduled variable contains the number or days that shift is scheduled, so for example there would be a "3" in the cell to represent 3 days.
I created the following code to understand how many shifts are staffed at a given hour.
data ED_A_7a_4p;
set schedule schedule10;
if shift = 'ED A 7a-4p' and Scheduled = '3' then SevenToEightAM = ???;
if shift = 'ED A 7a-4p' and Scheduled = '7' then EightToNineAM = ???;
run;
I would like the created variables, for example 'SevenToEightAM', to equal the number that is in "scheduled" variable column. So if 'scheduled' is 3 I would want 'SevenToEightAM' to equal 3.
The issue is that 'scheduled' is totally random and I can't autocode it so I was hoping there is a conditional option in SAS that would allow me to make 'SevenToEightAM' to whatever "scheduled" is within my dataset.
You probably want a TABULATE report instead of creating new variables. Try:
data have;
set original;
scheduled_num = input(scheduled, best12.);
run;
Proc TABULATE data=have;
class shift;
var scheduled_num;
table shift, scheduled*sum;
Related
I need your help, please!
I'm doing a proc transpose on SAS, from a table that as only unique lines. However it is returning the following error
ERROR: The ID value "'OUTROS_CANAIS_Fatura Eletrónica'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
ID_CLIENTE=xxxxxxxxxx
When I check the original table the ID_CLIENTE xxxxxxxxxxx has two lines:
ID_CLIENTE MOTIVO Nr_Solicitacoes
xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - adesão 1
xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - cancelamento 1
I believe it is the '-' that is causing the issue (that comes with the original data), since they are clearly two different values.
Any ideas how to solve this?
EDIT: I've managed to replace the '-' value, however it still returns the same error...
Thank you!!
Proc TRANSPOSE ID statement turns data values into columns names when pivoting data. Column names are limited to 32 characters (and column labels are limited to 200 characters). Your ID values when truncated to 32 characters are the same value and you get the 'occurs twice' LOG message.
You can add a new variable to distinguish the id values and use the IDLABEL statement to store the original id values in the variable labels.
Example:
idnum is added to the data and is used to distinguish the id values. If you have many id values a hash can be used to dynamically assign a unique idnum for each id value
options validvarname = v7;
data have;
id = 'xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - adesão';
idnum = 1;
count = 1;
output;
id = 'xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - cancelamento 1';
idnum = 2;
output;
run;
proc transpose data=have out=want;
id idnum;
idlabel id;
var count;
run;
proc contents data=work.want;
run;
Figured it out!
SAS only allows 32 bites columns... It was a coincidence that ended in '-'.
I am new to sas and need some help (yes, Ive looked through everything - maybe I am just not asking it the right way but here I am): lets say I want to create a dataset from sashelp.cars and I want there to be 5 observations for every make:
ie: 5 obs for acura, 5 obs for audi, 5 obs for bmw etc. ANd I want all the data returned, but only limited to the 5 observations per make.
How would I do this without a macro but a loop instead? My actual data set has 93 distinct values and I don't want to use 93 macro calls
Thanks in advance!!!!
Which 5 obs do you want for each make? The first 5? The last 5? Some sort of random sample?
If it's the latter, proc surveyselect is the way to go:
proc sort data = sashelp.cars out = cars;
by make;
run;
proc surveyselect
data = cars
out = mysample
method = URS
n = 5
selectall;
strata make;
run;
Setting method = URS requests unrestricted random sampling with replacement. As this allows the same row to be selected multiple times, we are guaranteed 5 rows per make in the sample, even if there are < 5 in the input dataset. If you just want to take all available rows in that scenario, you can use method = srs to request simple random sampling.
If you want the first 5 per make, then sort as before, then use a data step:
data mysample;
set cars;
by make;
if first.make then rowcount = 0;
rowcount + 1;
if rowcount <= 5;
run;
Getting the last 5 rows per make is very similar - if you have a key column that you can use to reverse the order within each make, that's the simplest option.
I have a dataset of 60 observations and I'm trying to create a new variable to label the first 30 observations as "A" and the second 30 observations as "B". I assume I would need to use an IF THEN statement for this, but how would I go about creating the new variable and writing the statement to achieve this goal?
Figured out a roundabout way to solve this.
DATA market_new;
SET mydata_old;
id = _N_;
RUN;
Then using conditional logic within this.
IF id <31 THEN Letter = 'A';
IF id >=31 THEN Letter = 'B';
The _N_ internal variable is what you want to use. I would first permanently capture this in a variable, like row_num=_N_, because if the data set gets reordered for some reason, _N_ will change. Then you can do your conditional assignments according to the value of row_num.
So, I have a significant problem with proc compare. I have two datasets with the two columns. One column lists table names and the other one - names of variables which correspond to table names from the first column. I want compare values of one of them based on the values of first column. I somewhat made it work but the thing is that these datasets have different sizes due to additional values in one of them. Which means that some new variable was added in the middle of a dataset (new variable was added to a table). Unfortunately, proc compare compares values from two datasets horizontally and checks them against each other for values, so in my case it looks like this:
ds 1 | ds 2
cost | box_nr
other | cost_total
As you can see, a new value box_nr was added to the second dataset that appears above the value that I want it to compare variable cost to (cost_total). So I would like to know if it's possible to compare values (check for differences in character sequence) that have at least minimal similarity - for example 3 letters (cos) or if it's possible to just put values like box_nr at the end suggesting that they don't appear in a certain dataset.
My code:
PROC Compare base=USERSPDS.MIzew compare=USERSPDS.MIwew
out=USERSPDS.result outbase outcomp outdif noprint;
id 'TABLE HD'n;
where ;
run;
proc print data=USERSPDS.result noobs;
by 'TABLE HD'n;
id 'TTABLE HD'n;
title 'COMPARISON:';
run;
Untested, but this should get you some of the way.
proc sql;
create table compare as
select
coalesce(a.cola, b.cola) as cola,
a.colb as acolb,
b.colb as bcolb
from dataa as a
full outer join datab as b
on
a.cola = b.cola and
compged(a.colb, b.colb) <= 100;
quit;
Have a look at the compged documentation for further information.
Sounds like you could make a new variable in both datasets, VAR3chars=substr(var,1,3) and then add that variable to your ID statement. I think that should work unless there are duplicate values.
So if one dataset had var="cost" and the other had var="cost_total", they would match on the id so they would be compared and found to be different.
If one dataset had var="box_nr" and the other did not have any values starting with "box", they would not match on the id so compare would find that a record exists for that id in one dataset but not the other.
I'm trying to compare multiple sets of data by putting them in separate groups between two numbers. Originally I had statements like,
if COLUMN1 gt 0 and COLUMN1 LE 1000 then PRICE_GROUP = 1000;
I had this going up by 1000 to 100,000. The only problem is that once I counted how many were in each price_group, some price_groups were missing (57,000 had no values so when I would count(Price_group) it would not appear for some groups). The solution I think is to make a table with the bounds for each, and then compare the actual value vs the upper and lower bound.
proc iml;
mat = j(100,2,0);
total = 100000;
mat[1,1] = 0;
mat[1,2] = mat[1,1] + (total/100);
do i = 2 to nrow(mat);
mat[i,1] = mat[i-1,1] + (total/100);
mat[i,2] = mat[i,1] + (total/100);
end;
create dataset from mat;
append from mat;
quit;
This creates the table which I can compare the values, but is there an easier way besides proc iml? I was next going to do a loop to compare each value with the two columns and create a new column on the table to have the count in each bucket. This still seems like an intensive process that is inefficient.
IML isn't a terrible solution, but there are a few others depending on what exactly you're doing.
The most common is proc format. Create a format that manages each bucket, like so:
proc format;
value buckets
0-1000 = 1000
1000<-2000 = 2000
...
other="NA";
quit;
Then you can either use the format (or informat) to create a new variable with the bucketed value, or even better, use the format on the fly (ie, in proc means or whatnot) which not only means you don't have to rewrite the dataset, but you can swap formats on and off depending on how many buckets you want (say, buckets100 format or buckets20 and whatnot).
Second, your specific question looks like it's solveable just using math:
data want;
set have;
bucket = &total/100*(1+floor(column1/(&total/100)));
run;
although obviously that doesn't work for every example.
Third, you could use a hash lookup table, if you are unable to use formats (such as there are two or more elements that determine the bucket). If that's useful I can expand on that, or just google about as those are very commonly used for lookups in SAS. That's the closest solution to the IML solution inside a regular datastep.
Create another table with groups:
data group_table;
do price_group=1000 to 100000 by 1000;
output;
end;
run;
Then left join the grouping/comparison table with this new table using price_group as key:
proc sql;
create table price_group_compare as
select L.price_group,R.group1_count,R.group2_count
from group_table as L, group_counts as R
on L.price_group = R.price_group;
quit;