I am trying to concatenate three string variables.
Data X ;
a = "A" ;
b = "B" ;
c = "C" ;
z = catx ( '0D0A'x, a, b, c ) ;
run;
I am trying to display the string values like this in the final dataset, so that the values appear one below the other -
A
B
C
But by using '0D0A'x option, the string appears as ABC. I have to display the z variable in an excel. If I had to output the same into a HTML file then I would have used "\n" as an option in CATX function. Is there a way where I can introduce new line characters.
I adapted your example slightly:
libname test excel "%sysfunc(pathname(work))\text.xls";
Data test.X ;
a = "A" ;
b = "B" ;
c = "C" ;
z = catx ( '0D0A'x, a, b, c ) ;
run;
libname test clear;
x "explorer ""%sysfunc(pathname(work))\text.xls""";
If you use this approach, you get a value in cell D2 which contains the line breaks as expected. However, in order for them to display correctly, you have to enable the 'wrap text' option for the cell formatting.
Related
I created a SAS function using fcmp to calculate the jaccard distance between two strings. I do not want to use macros, as I'm going to use it through a large dataset for multiples variables. the substrings I have are missing others.
proc fcmp outlib=work.functions.func;
function distance_jaccard(string1 $, string2 $);
n = length(string1);
m = length(string2);
ngrams1 = "";
do i = 1 to (n-1);
ngrams1 = cats(ngrams1, substr(string1, i, 2) || '*');
end;
/*ngrams1= ngrams1||'*';*/
put ngrams1=;
ngrams2 = "";
do j = 1 to (m-1);
ngrams2 = cats(ngrams2, substr(string2, j, 2) || '*');
end;
endsub;
options cmplib=(work.functions);
data test;
string1 = "joubrel";
string2 = "farjoubrel";
jaccard_distance = distance_jaccard(string1, string2);
run;
I expected ngrams1 and ngrams2 to contain all the substrings of length 2 instead I got this
ngrams1=jo*ou*ub
ngrams2=fa*ar*rj
If you want real help with your algorithm you need to explain in words what you want to do.
I suspect your problem is that you never defined how long you new character variables NGRAM1 and NGRAM2 should be. From the output you show it appears that FCMP defaulted them to length $8.
To define a variable you need use a LENGTH statement (or an ATTRIB statement with the LENGTH= option) before you start referencing the variable.
There is a scenario where I receive a string to the bigquery function and need to use it as a column name.
here is the function
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT column from WORK.temp WHERE rownumber = row_number)
);
When I call this function as select METADATA.GET_VALUE("TXCAMP10",149); I get the value as TXCAMP10 so we can say that it is processed as SELECT "TXCAMP10" from WORK.temp WHERE rownumber = 149 but I need it as SELECT TXCAMP10 from WORK.temp WHERE rownumber = 149 which will return some value from temp table lets suppose the value as A
so ultimately I need value A instead of column name i.e. TXCAMP10.
I tried using execute immediate like execute immediate("SELECT" || column || "from WORK.temp WHERE rownumber =" ||row_number) from this stack overflow post to resolve this issue but turns out I can't use it in a function.
How do I achieve required result?
I don't think you can achieve this result with the help of UDF in standard SQL in BigQuery.
But it is possible to do this with stored procedures in BigQuery and EXECUTE IMMEDIATE statement. Consider this code, which simulates the situation you have:
create or replace table d1.temp(
c1 int64,
c2 int64
);
insert into d1.temp values (1, 1), (2, 2);
create or replace procedure d1.GET_VALUE(column STRING, row_number int64, out result int64)
BEGIN
EXECUTE IMMEDIATE 'SELECT ' || column || ' from d1.temp where c2 = ?' into result using row_number;
END;
BEGIN
DECLARE result_c1 INT64;
call d1.GET_VALUE("c1", 1, result_c1);
select result_c1;
END;
After some research and trial-error methods, I used this workaround to solve this issue. It may not be the best solution when you have too many columns but it surely works.
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT case
when column_name = 'a' then a
when column_name = 'b' then b
when column_name = 'c' then c
when column_name = 'd' then d
when column_name = 'e' then e
end from WORK.temp WHERE rownumber = row_number)
);
And this gives the required results.
Point to note: the number of columns you use in the case statement should be of the same datatype else it won't work
Essentially, what I would like to do is use the min/max function while altering a table. I am altering a table, adding a column, and then having that column set to a combination of a min/max function. In SAS, however, you can't use summary functions. Is there a way to go around this?
There are many more inputs but for the sake of clarity, a condensed version is below! Thanks!
%let variable = 42
alter table X add Z float;
update X
set C = min(max(0,500 - %sysevalf(variable)),0);
First, let's remove the %sysevalf(), they are not needed and format for readability
alter table claims.simulation add Paid_Claims_NoISL float;
update claims.simulation
set Paid_Claims_NoISL
= min(
max(0
, Allowed_Claims -&OOPM
, min(Allowed_Claims
,&Min_Paid+ max(Allowed_Claims - &Deductible * &COINS
,0
)
)
, &Ind_Cap_Claim
)
);
Notice that the first min() only has 1 argument. That is causing your ERROR. SAS thinks that because it only has 1 input, you want to summarize a column, which is not allowed in an update.
Just take that out and it should work:
alter table claims.simulation add Paid_Claims_NoISL float;
update claims.simulation
set Paid_Claims_NoISL
= max(0
, Allowed_Claims -&OOPM
, min(Allowed_Claims
,&Min_Paid+ max(Allowed_Claims - &Deductible * &COINS
,0
)
)
, &Ind_Cap_Claim
);
To reference the value of a macro variable you need to use &.
%let mvar = 42;
proc sql;
update X set C = min(max(0,500 - &mvar),0);
quit;
Note there is no need to use the macro function %SYSEVALF() since SAS can more easily handle the case when the value &mvar is an expression than the macro processor can.
%let mvar = 500 - 42;
proc sql;
update X set C = min(max(0,&mvar),0);
quit;
This issue drives me crazy. In SAS, when I want to concat a string, the variable which will be assigned to the result cannot be used in the input.
DATA test1;
LENGTH x $20;
x = "a";
x = x || "b";
RUN;
Result: x = "a";
DATA test2;
LENGTH x $20;
y = "a";
x = y || "b";
RUN;
Result: x = "ab";
DATA test3;
LENGTH x $20;
x = "a";
y = x;
x = y || "b";
RUN;
Result: x = "a";
The last one is so strange. x is not even involved in concat directly.
This does not make sense. Because 1) you can do other operations in this way, e.g. transtrn, substr. 2) SAS does not give any warning message.
Why?
It's because the length of X is initially set 20, so it has 19 trailing blanks. If you add b, there isn't room for it, because of the trailing blanks. Either trim the x before cat operator or use catt. You can use lengthc to see the length of the character variable.
DATA test1;
LENGTH x $20;
x = "a";
len=lengthc(x);
x = trim(x) || "b";
*x = catt(x, b);
RUN;
proc print data=test1;
run;
You could also use substr() on the left hand side of the equation. Something like:
substr(x,10,1) = 'a';
to set the 10th car to 'a'. Then loop over each character in x (where the 10 is).
I'm trying to create a code to run a simple perceptron in SAS base.
I'd like to print in each iteration (or store in a table) the result and the target, but I get an error when I try to print y[i,]:
proc iml;
use percept; read all var{x1 X2} into X;
read all var{Y} into Y;
W={0,0}; b=0; k=0; L=nrow(X); eta=.8; o=0;
print w b k L eta;
do step = 1 to 6;
mistakes=0;
do i=1 to L;
o=(X[i, ]*W + b);
if Y[i, ]*o <= 0 then do;
W = W + eta*(Y[i, ]-o)*X[i,]`;
b = b + eta*(Y[i, ]-o)*1;
k=k+1; mistakes=mistakes+1;
print o Y[i, ] W b k mistakes;
end;
end;
end;
I get the error:
Syntax error, expecting one of the following: C, COLNAME, F, FORMAT,
L, LABEL, R,
ROWNAME, ], |). The option or parameter is not recognized and will be ignored.
Do I have any other form to print the target?
Thanks a lot!
Per the documentation on PRINT, you need to do it like this:
print(Y[i,])
This is because they overload the [ ] to indicate formatting, rownames/colnames, etc., which is rather silly (but presumably to imitate some other language?). So you just need to wrap (Y[i,]) like so.
Here's a silly example.
proc iml;
use sashelp.class;
read all var{name,sex} into class;
read all var{height,weight,age} into classN;
y = mean(classN[,2]);
print class;
print (class[1:2,]);
print y (class[1:2,]);
quit;