I'd like to replicate and modify specific rows in the table.
before:
xyz_id | letter | Col_1 | Col_2|
1 | Z | V1 | W1 |
2 | Z | V2 | W2 |
3 | Z | V3 | W3 |
after:
xyz_id | letter | Col_1 | Col_2|
1 | A | V1.1 | W1.1 |
1 | B | V1.1 | W1.1 |
1 | C | V1.1 | W1.1 |
2 | A | V2.1 | W2.1 |
2 | B | V2.1 | W2.1 |
2 | C | V2.1 | W2.1 |
3 | A | V3.1 | W3.1 |
3 | B | V3.1 | W3.1 |
3 | C | V3.1 | W3.1 |
I've prepared the following code:
data test2;
set test;
array letters {8} $8 _temporary_ ('A', 'B', 'C');
array weights {8} _temporary_ (1,2,3);
array nvars {2} Col_1 Col_2;
do i = 1 to 8;
letter = letters(i);
do j=1 to 2;
nvar{j} = nvar{j} * weights(i);
end;
output;
end;
drop i;
run;
but it doesn't work. Any suggestions?
You reference the nvar array in the j loop, but the array is called nvars.
Col_1 and Col_2 are characters (e.g. 'V1'), so you cannot multiply these by a number and expect a valid result.
Your letters and weights arrays do not have enough values defined for the number of elements (3 vs 8).
Related
I am trying to run a code that should work on tables created considering different factors. As these factors can be more than 1, I decided to create a macro %let to list them:
%let list= factor1 factor2 ...;
What I would like to do is run a code to create these tables using different factors. For each factor, I computed using proc means the mean and the standard deviation, so I should have the variables &list._mean and &list._stddev in the table created by the proc means for each factor. This table is labelled as t2 and I need to join to another table, t1. From t1 I am considering all the variables.
My main difficulties are, therefore, in the proc sql:
proc sql;
create table new_table as
select t1.*
, t2.&list._mean as mean
, t2.&list._stddev as stddev
from table1 as t1
left join table2 as t2
on t1.time=t2.time
order by t2.&list.
quit;
This code is returning an error and I think because I am considering t2.factor1 factor2, i.e. t2 is only applied to the first factor, not to the second one.
What I would expect is the following:
proc sql;
create table new_table as
select t1.*
, t2.factor1._mean as mean
, t2.factor1._stddev as stddev
from table1 as t1
left join table2 as t2
on t1.time=t2.time
order by t2.factor1.
quit;
and another one for factor2.
UPDATE CODE:
%macro test_v1(
_dtb
,_input
,_output
,_time
,_factor
);
data &_input.;
set &_dtb..&_input.;
keep &_col_period. &_factor.;
run;
proc sort data = work.&_input.
out = &_input._1;
by &_factor. &_time.;
run;
%put ERROR: 2
proc means data=&_input._1 nonobs mean stddev;
class &_time.;
var &_factor.;
output out=&_input._n (drop=_TYPE_) mean= stddev= /autoname ;
run;
%put ERROR: 3
proc sql;
create table work.&_input._data as
select t1.*
,t2.&_factor._mean as mean
,t2.&_factor._stddev as stddev
from &_input. as t1
left join &_input._n as t2
on t1.&_time.=t2.&_time.
order by &_factor.;
quit;
%mend test_v1;
Then my question is on how I can consider multiple factors, defined into a macro as a list, as columns of tables and as input data into a macro (for example: %test(dataset, tablename, list).
I suspect that trying to use PROC SQL is what is making the problem hard. If you stick to just using normal SAS syntax your space delimited list of variable names is easy to use.
So taking your code and tweaking it a little:
%macro test_v1
(_dtb /* Input libref */
,_input /* Input member name */
,_output /* Output dataset */
,_time /* Class/By variable(s) */
,_factor /* Analysis variable(s) */
);
proc sort data= &_dtb..&_input. out=_temp1;
by &_time. ;
run;
proc means data=_temp1 nonobs mean stddev;
by &_time.;
var &_factor.;
output out=_temp2 (drop=_TYPE_) mean= stddev= /autoname ;
run;
data &_output. ;
merge _temp1 _temp2 ;
by &_time.;
run;
%mend test_v1;
We can then test it using SASHELP.CLASS by using SEX as the "time" variable and HEIGHT and WEIGHT as the analysis variables.
%test_v1(_dtb=sashelp,_input=class,_output=want,_time=sex,_factor=height weight);
You can try to add macro loop to your macros by scanning list of factors. It could look like:
%macro test(list);
%do i=1 to %sysfunc(countw(&list,%str( )));
%let factorname=%scan(&list,&i,%str( ));
/* if macro variable list equals factor1 factor2 then there would be
two iterations in loop, i=1 factorname=factor1 and i=2 factorname=2*/
/*your code here*/
%end
%mend test;
UPDATE:
%macro test(_input, _output, factors_list); %macro d; %mend d;
%do i=1 %to %sysfunc(countw(&factors_list,%str( )));
%let tfactor=%scan(&factors_list,&i,%str( ));
proc sort data = work.&_input.
out = &_input._1;
by &factors_list. time;
run;
proc means data=&_input._1 nonobs mean stddev;
class time;
var &tfactor.;
output out=&_input._num (drop=_TYPE_) mean= stddev= /autoname ;
run;
proc sql;
create table &_output._&tfactor as
select t1.*
, t2.&tfactor._mean as mean
, t2.&tfactor._stddev as stddev
from &_input as t1
left join &_input._num as t2
on t1.time=t2.time
order by t1.&tfactor;
quit;
%end;
%mend test;
%test(have,newdata,factor1 factor2);
Have dataset:
+------+---------+---------+
| time | factor1 | factor2 |
+------+---------+---------+
| 1 | 12345 | 1234 |
| 2 | 123 | 12 |
| 3 | 1 | -1 |
| 4 | -12 | -123 |
| 5 | -1234 | -12345 |
| 6 | 9876 | 987 |
| 7 | 98 | 8 |
| 8 | 9 | 7 |
| 1 | 1234 | 123 |
| 2 | 12 | 1 |
| 3 | 12 | -12 |
| 4 | -123 | -1234 |
| 5 | -12345 | -123456 |
| 6 | 987 | 98 |
| 7 | 9 | -9 |
| 8 | 1234 | 1234 |
+------+---------+---------+
NEWDATA_FACTOR1:
+------+---------+---------+---------+--------------+
| time | factor1 | factor2 | mean | stddev |
+------+---------+---------+---------+--------------+
| 5 | -12345 | -123456 | -6789.5 | 7856.6634458 |
| 5 | -1234 | -12345 | -6789.5 | 7856.6634458 |
| 4 | -123 | -1234 | -67.5 | 78.488852712 |
| 4 | -12 | -123 | -67.5 | 78.488852712 |
| 3 | 1 | -1 | 6.5 | 7.7781745931 |
| 7 | 9 | -9 | 53.5 | 62.932503526 |
| 8 | 9 | 7 | 621.5 | 866.20580695 |
| 3 | 12 | -12 | 6.5 | 7.7781745931 |
| 2 | 12 | 1 | 67.5 | 78.488852712 |
| 7 | 98 | 8 | 53.5 | 62.932503526 |
| 2 | 123 | 12 | 67.5 | 78.488852712 |
| 6 | 987 | 98 | 5431.5 | 6285.472178 |
| 1 | 1234 | 123 | 6789.5 | 7856.6634458 |
| 8 | 1234 | 1234 | 621.5 | 866.20580695 |
| 6 | 9876 | 987 | 5431.5 | 6285.472178 |
| 1 | 12345 | 1234 | 6789.5 | 7856.6634458 |
+------+---------+---------+---------+--------------+
NEWDATA_FACTOR2:
+------+---------+---------+----------+--------------+
| time | factor1 | factor2 | mean | stddev |
+------+---------+---------+----------+--------------+
| 5 | -12345 | -123456 | -67900.5 | 78567.341564 |
| 5 | -1234 | -12345 | -67900.5 | 78567.341564 |
| 4 | -123 | -1234 | -678.5 | 785.5956339 |
| 4 | -12 | -123 | -678.5 | 785.5956339 |
| 3 | 12 | -12 | -6.5 | 7.7781745931 |
| 7 | 9 | -9 | -0.5 | 12.02081528 |
| 3 | 1 | -1 | -6.5 | 7.7781745931 |
| 2 | 12 | 1 | 6.5 | 7.7781745931 |
| 8 | 9 | 7 | 620.5 | 867.62002052 |
| 7 | 98 | 8 | -0.5 | 12.02081528 |
| 2 | 123 | 12 | 6.5 | 7.7781745931 |
| 6 | 987 | 98 | 542.5 | 628.61792847 |
| 1 | 1234 | 123 | 678.5 | 785.5956339 |
| 6 | 9876 | 987 | 542.5 | 628.61792847 |
| 1 | 12345 | 1234 | 678.5 | 785.5956339 |
| 8 | 1234 | 1234 | 620.5 | 867.62002052 |
+------+---------+---------+----------+--------------+
I have simplified my problem to solve. Lets suppose I have three tables. One containing data and specific codes that identify objects lets say Apples.
+-------------+------------+-----------+
| Data picked | Color code | Size code |
+-------------+------------+-----------+
| 1-8-2018 | 1 | 1 |
| 1-8-2018 | 1 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 2 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 3 | 3 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 5 | 3 |
| 1-8-2018 | 6 | 1 |
| 1-8-2018 | 6 | 2 |
| 1-8-2018 | 6 | 2 |
+-------------+------------+-----------+
And i have two related helping tables to help understand the codes (their relationships are inactive in the model due to ambiguity with other tables in the real case).
+-----------+--------+
| Size code | Size |
+-----------+--------+
| 1 | Small |
| 2 | Medium |
| 3 | Large |
+-----------+--------+
and
+------------+----------------+-------+
| Color code | Color specific | Color |
+------------+----------------+-------+
| 1 | Light green | Green |
| 2 | Green | Green |
| 3 | Semi green | Green |
| 4 | Red | Red |
| 5 | Dark | Red |
| 6 | Pink | Red |
+------------+----------------+-------+
Lets say that I want to create an extra column in the original table to determine which apples are class A and class B given that medium green Apples are class A and large Red apples are class B, the other remain blank as the example below.
+-------------+------------+-----------+-------+
| Data picked | Color code | Size code | Class |
+-------------+------------+-----------+-------+
| 1-8-2018 | 1 | 1 | |
| 1-8-2018 | 1 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 2 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 3 | 3 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 5 | 3 | B |
| 1-8-2018 | 6 | 1 | |
| 1-8-2018 | 6 | 2 | |
| 1-8-2018 | 6 | 2 | |
+-------------+------------+-----------+-------+
What's the proper DAX to use given the relationships are initially inactive. Preferably solvable without creating any further additional columns in any table. I already tried codes like:
CALCULATE (
"A" ;
FILTER ( 'Size Table' ; 'Size Table'[Size] = "Medium");
FILTER ( 'Color Table' ; 'Color Table'[Color] = "Green")
)
And many variations on the same principle
Given that the relationships are inactive, I'd suggest using LOOKUPVALUE to match ID values on the other tables. You should be able to create a calculated column as follows:
Class =
VAR Size = LOOKUPVALUE('Size Table'[Size],
'Size Table'[Size code], 'Data Table'[Size code])
VAR Color = LOOKUPVALUE('Color Table'[Color],
'Color Table'[Color code], 'Data Table'[Color code])
RETURN SWITCH(TRUE(),
(Size = "Medium") && (Color = "Green"), "A",
(Size = "Large") && (Color = "Red"), "B", BLANK())
If your relationships are active, then you don't need the lookups:
Class = SWITCH(TRUE(),
(RELATED('Size Table'[Size]) = "Medium") &&
(RELATED('Color Table'[Color]) = "Green"),
"A",
(RELATED('Size Table'[Size]) = "Large") &&
(RELATED('Color Table'[Color]) = "Red"),
"B",
BLANK())
Or a bit more elegantly written (especially for more classes):
Class =
VAR SizeColor = RELATED('Size Table'[Size]) & " " & RELATED('Color Table'[Color])
RETURN SWITCH(TRUE(),
SizeColor = "Medium Green", "A",
SizeColor = "Large Red", "B",
BLANK())
I have two table looks like and I want to add column score to tableA from tableB, then get tableC, how to do in SAS?
the only rule is to add a column in tableA name "score " and its value is same as column "score" in tableB (which are all the same in tableB)
+----+---+---+---+
| id | b | c | d |
+----+---+---+---+
| 1 | 5 | 7 | 2 |
| 2 | 6 | 8 | 3 |
| 3 | 7 | 8 | 1 |
| 4 | 5 | 7 | 2 |
| 5 | 6 | 8 | 3 |
| 6 | 7 | 8 | 1 |
+----+---+---+---+
tableA
+---+---+-------+
| e | f | score |
+---+---+-------+
| 3 | 7 | 11 |
| 4 | 6 | 11 |
| 5 | 5 | 11 |
+---+---+-------+
tableB
+----+---+---+---+-------+
| id | b | c | d | score |
+----+---+---+---+-------+
| 1 | 5 | 7 | 2 | 11 |
| 2 | 6 | 8 | 3 | 11 |
| 3 | 7 | 8 | 1 | 11 |
| 4 | 5 | 7 | 2 | 11 |
| 5 | 6 | 8 | 3 | 11 |
| 6 | 7 | 8 | 1 | 11 |
+----+---+---+---+-------+
tableC
If the "id" is present in both tables, you can use the following to create Table C:
PROC SQL;
CREATE TABLE tableC AS
SELECT a.*, b.score
FROM tableA a JOIN tableB b
ON a.id = b.id;
QUIT;
Please confirm that this is what you need?
Table 1
| | 1 Jan 2018 | 2 Jan 2018 | 3 Jan 2018 | 4 Jan 2018 | 5 Jan 2018 |
|----|------------|------------|------------|------------|------------|
| A1 | | | | | |
| A2 | | | | | |
| A3 | | | | | |
| A4 | | | | | |
| A5 | | | | | |
| A6 | | | | | |
Table 2
|----|----------|-----------|-----------|-----------|-----------|
| A1 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A2 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A3 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A4 | 3-Jan-18 | 6-Jan-18 | 10-Jan-18 | 13-Jan-18 | 17-Jan-18 |
| A5 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A6 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
IF MATCH
=IF(MATCH(A2,$A$27:$A$54,0) & MATCH(C1,$B$27:$S$54,0),"1","")
Getting #N/A error out of it
Trying to get apply formula onto the cells in Table 1 to lookup values in Table 2
If it matches, output is 1, else 0
Table & Image above to clearly illustrate & experiment out.
Thanks in advance (:
Applied formula & Output
Firstly, MATCH() returns a number that represents the position of a found match so your formula says IF(1 & 1,"1","") for your first potential match, there is no logical here.
The first ammendment would be to force a True / False output: =IF(AND(ISNUMBER(MATCH()),ISNUMBER(MATCH())),"1","")
You still have the issue that the second match is referencing the entire range of resuts though, you really want this to only look through the row that meets the first criteria, for this we will use an array formula to build the array you want to use:
EDIT: You can't buld an array from Match as it returns a single integer:
=LARGE(IF(B$1=IF($A2=$A$27:$A$54,$B$27:$S$54),1,0),1)
This is an array formula, while still in the formula bar hit Ctrl+Shift+Enter
The Inner IF() statemnet is building an array of each row, providing values where column A matches and FALSE where it doesn't. The outter IF() statement is then evaluating 0 or 1 whether it finds the date in that new array...
I have wrapped this in a LARGE() to return the first largest number so if a single match is found it will return that 1. If you want the blank you can wrap the whole thing in another IF() statement; IF([formula]=0,"",1)
I have a dataset containing different product values generated in each simulation, with the following layout:
+------------+-------+-------+-------+
| simulation | v1 | v2 | v3 |
+------------+-------+-------+-------+
| 1 | 0,500 | 0,400 | 0,300 |
| 2 | 0,900 | 0,800 | 0,800 |
| 3 | 0,100 | 0,200 | 0,300 |
+------------+-------+-------+-------+
The variable names v1, v2, v3 are labelled as product ids and are not displayed at the header of the dataset. I need to reshape this dataset to long format so it would like this:
+------------+----+----------+-------+
| simulation | id | label | value |
+------------+----+----------+-------+
| 1 | v1 | 01020304 | 0,500 |
| 1 | v2 | 01020305 | 0,400 |
| 1 | v3 | 01020306 | 0,300 |
| 2 | v1 | 01020304 | 0,900 |
| 2 | v2 | 01020305 | 0,800 |
| 2 | v3 | 01020306 | 0,800 |
| 3 | v1 | 01020304 | 0,100 |
| 3 | v2 | 01020305 | 0,200 |
| 3 | v3 | 01020306 | 0,300 |
+------------+----+----------+-------+
The standard code reshape long v , i(simulation) j(_count) is not applicable in this case, as I need to reshape the variable labels and keep them in the dataset as variable values. Was wondering if there exists a way to make this kind of transposition with variable labels?
Only one idea seems needed here. If your variable labels would otherwise disappear, save them in a local macro before a reshape and then apply them as value labels afterwards. The FAQ cited earlier gives the flavour.
Sandpit to play in:
input simulation v1 v2 v3
simulat~n v1 v2 v3
1. 1 0.500 0.400 0.300
2. 2 0.900 0.800 0.800
3. 3 0.100 0.200 0.300
4. end
label var v1 "01020304"
label var v2 "01020305"
label var v3 "01020306"
Sample code:
forval j = 1/3 {
local labels `labels' `j' "`: var label v`j''"
}
reshape long v, i(simulation)
(note: j = 1 2 3)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 3 -> 9
Number of variables 4 -> 3
j variable (3 values) -> _j
xij variables:
v1 v2 v3 -> v
-----------------------------------------------------------------------------
rename v value
label def label `labels'
rename _j label
gen id = label
label val label label
list
+----------------------------------+
| simula~n label value id |
|----------------------------------|
1. | 1 01020304 .5 1 |
2. | 1 01020305 .4 2 |
3. | 1 01020306 .3 3 |
4. | 2 01020304 .9 1 |
5. | 2 01020305 .8 2 |
|----------------------------------|
6. | 2 01020306 .8 3 |
7. | 3 01020304 .1 1 |
8. | 3 01020305 .2 2 |
9. | 3 01020306 .3 3 |
+----------------------------------+