SAS catx function removes spaces - sas

I have a dataset which has 1 column and n rows like this:
Dataset1:
Column1
--------
AAA AAA
BBB BBB
CCC CCC
DDD DDD
EEE EEE
I want to make from this data 1 row like:
"AAA AAA"n "BBB BBB"n "CCC CCC"n "DDD DDD"n "EEE EEE"n
I will make this in macro.
I used like catx function. But the function removes spaces from data..
I used do loop like this :
.
.
.
.
data _NULL_;
if &I^=1 then do;
frstclmn=&frstclmn||""""||"&clmn"||""""||"n ";
end;
run;
.
.
.
But I couldn't assing a variable in do lop in data statement with itself.
How can I do ? Thanks
Edit:
%MACRO result;
data _NULL_;
set &LIB_NAME..column_list;
retain namelist;
length namelist $5000;
namelist=catx(' ',namelist,cats('"',name,'"n'));
run;
---how can I use "namelist" variable here ? out of data statement.---
%MEND result;
This code runs perfectly. Now I want to use this namelist variable out of this data statement. If I print like this %put &namelist=; It show wrong result in macro. I want to use this variable result in macro other statement.

It's not clear to me what output you seek. Perhaps this will give you some hints.
data names;
input name $32.;
cards;
AAA AAA
BBB BBB
CCC CCC
DDD DDD
EEE EEE
;;;;
run;
proc sql noprint;
select nliteral(name) into :namelist separated by ' ' from names;
quit;
run;
%put NOTE: &=namelist;
NOTE: NAMELIST="AAA AAA"N "BBB BBB"N "CCC CCC"N "DDD DDD"N "EEE EEE"N

The sql method data _null_ shows above is the better method, but if you're going to do it in data step, use ' ' as your delimiter.
data _NULL_;
set sashelp.class;
retain namelist;
length namelist $500;
namelist=catx(' ',namelist,cats('"',name,'"n'));
put namelist=;
run;
Of course you could use quote, or nliteral, both to better effect.
data _NULL_;
set sashelp.class;
retain namelist;
length namelist $500;
namelist=catx(' ',namelist,nliteral(name));
put namelist=;
run;

Related

How to import a txt file with single quote mark in a variable and another in another variable, when there are also variables with null values

This is a follow-up of my previous question:
How to import a txt file with single quote mark in a variable and another in another variable.
The solution there works perfectly until there is not a variable whose values could be null.
In this latter case, I get:
filename sample 'c:\temp\sample.txt';
data _null_;
file sample;
input;
put _infile_;
datalines;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data want;
data prova;
infile sample dlm='|' lrecl=50 truncover;
format
codice $3.
could_be_null $20.
nome $20.
luogo $20.
importo 4.
;
input
codice
could_be_null
nome
luogo
importo
;
putlog _infile_;
run;
proc print;
run;
Is it possible to correctly load a file like the one in the example directly in SAS, without manually modifying the original .txt?
You will need to pre-process the file to fix the issue.
If you add quotes around the values then you will not have the problem.
002||"'80S WERE GREAT"|"FORLI'"|1100
IF you know that none of the values contain the delimiter then adding a space before every delimiter
002 | |'80S WERE GREAT |FORLI' |1100
will let you read it without the DSD option.
If lines are shorter than 32K bytes then it can be done in the same step that reads the data.
data test2 ;
infile sample dlm='|' truncover ;
input #;
_infile_ = tranwrd(_infile_,'|',' |');
input (var1-var5) (:$40.);
run;
proc print;
run;
Results:
Obs var1 var2 var3 var4 var5
1 001 This variable could be null PROVA MILANO 1000
2 002 '80S WERE GREAT FORLI' 1100
3 003 '80S WERE GREAT ROMA 1110
One way to test if you have the issue is to make sure each line has the right number of fields.
filename sample temp;
options parmcards=sample;
parmcards;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data _null_;
infile sample dsd end=eof;
if eof then do;
call symputx('nfound',nfound);
putlog / 'Found ' nfound :comma11.
'problem lines out of ' _n_ :comma11. 'lines.'
;
end;
input;
retain expect nfound;
words=countw(_infile_,'|','qm');
if _n_=1 then expect=words;
else if expect ne words then do;
nfound+1;
if nfound <= 10 then do;
putlog (_n_ expect words) (=) ;
list;
end;
end;
run;
Example Results:
_N_=2 expect=5 words=4
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
2 002||'80S WERE GREAT|FORLI'|1100 32
_N_=3 expect=5 words=3
3 003||'80S WERE GREAT|ROMA|1110 30
Found 2 problem lines out of 4 lines.
PS Go tell SAS to enhance their delimited file processing: https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-file-variations/idi-p/435977
You need to add the DSD option to your INFILE statement.
https://support.sas.com/techsup/technote/ts673.pdf
DSD (delimiter-sensitive data) option—Specifies that SAS should treat
delimiters within a data value as character data when the delimiters
and the data value are enclosed in quotation marks. As a result, SAS
does not split the string into multiple variables and the quotation
marks are removed before the variable is stored. When the DSD option
is specified and SAS encounters consecutive delimiters, the software
treats those delimiters as missing values. You can change the default
delimiter for the DSD option with the DELIMTER= option.

Mixed Delimiters in Proc Export

Is there a method to make the first delimiter in an observation different to the rest? In Microsoft SQL Server Integration Services (SSIS), there is an option to set the delimiter per column. I wonder if there is a similar way to achieve this in SAS with an amendment to the below code, whereby the first delimiter would be tab instead and the rest pipe:
proc export
dbms=csv
data=mydata.dataset1
outfile="E:\OutPutFile_%sysfunc(putn("&sysdate9"d,yymmdd10.)).txt"
replace
label;
delimiter='|';
run;
For example
From:
var1|var2|var3|var4
to
var1 var2|var3|var4
...Where the large space between var1 and var2 is a tab.
Many thanks in advance.
Sounds like you just want to make a new variable that has the first two variables combined and then write that out using tab delimiter.
data fix ;
length new1 $50 ;
set have ;
new1=catx('09'x,var1,var2);
drop var1 var2 ;
run;
proc export data=fix ... delimiter='|' ...
Note that you can reference a variable in the DLM= option on the FILE statement in a data step.
data _null_;
dlm='09'x ;
file 'outfile.txt' dsd dlm=dlm ;
set have ;
put var1 # ;
dlm='|' ;
put var2-var4 ;
run;
Or you could use the catx() trick in a data _null step. You also might want to use vvalue() function to insure formats are applied.
data _null_;
length newvar $200;
file 'outfile.txt' dsd dlm='|' ;
set have ;
newvar = catx('09'x,vvalue(var1),vvalue(var2));
put newvar var3-var4 ;
run;
Updated Fixed order of delimiters to match question.
Final code based on the marked answer by Tom:
data _null_;
dlm='09'x ;
file "E:\outputfile_%sysfunc(putn("&sysdate9"d,yymmdd10.)).txt" dsd dlm=dlm ;
set work.have;
put
var1 # ;
dlm='|';
put var2 var3 var4;
run;

Put/print mean of a variable to log in SAS

How do i print the mean of a variable to the log in SAS?
data fruit;
input zip fruit & $22. pounds;
datalines;
10034 apples, grapes kiwi 123456
92626 oranges 97654
25414 pears apple 987654
;
This is what I've tried:
data _null_;
set fruit;
put mean(zip);
run;
You can use the MEANS procedure to calculate the mean of the pounds variable, followed by a CALL SYMPUT routine to assign the value to a macro variable, and finally a %PUT to print it in the log.
proc means data=fruit;
var pounds;
output out=testmean mean=fruit_mean;
run;
data _null_;
set testmean;
call symput("fruit_avg",fruit_mean);
run;
%put mean of x is &fruit_avg;
You can use PROC SQL.
proc sql noprint;
/*noprint as you don't want to print to the defined output location, just the log*/
/*format, formats the value. into :<> puts the value into a macro variable named <>*/
select mean(zip) format=best32.
into :zip_mean
from fruit;
/*%put writes a message to the log*/
%put Mean of Zip: &zip_mean;
quit;
If you are OK writing the value to the open output location then just use:
proc sql;
select mean(zip) format=best32.
from fruit;
quit;
In data step, you can use putlog statement to print the value on log. Based on #cherry_bueno 's answer, the putlog version is:
proc means data=fruit;
var pounds;
output out=testmean mean=fruit_mean;
run;
data _null_;
set testmean;
putlog 'mean of x is ' fruit_mean;
run;

How to rename variables without using their original names?

I have a data set that I am uploading to sas. There are always 4 variables in the exact same order. The problem is sometimes the variables could have slightly different names.
For example the first variable user . The next day i get the same dataset, it might be userid . . . So I cannot use rename(user=my_user)
Is there any way i could refer to the variable by their order . . something like this
rename(var_order_1=my_user) ;
rename(var_order_3=my_inc) ;
rename _ALL_=x1-x4 ;
There are a few ways to do this. One is to determine the variable names from PROC CONTENTS or dictionary.columns and generate rename statements.
data have;
input x1-x4;
datalines;
1 2 3 4
5 6 7 8
;;;;
run;
%macro rename(var=,newvar=);
rename &var.=&newvar.;
%mend rename;
data my_vars; *the list of your new variable names, and their variable number;
length varname $10;
input varnum varname $;
datalines;
1 FirstVar
2 SecondVar
3 ThirdVar
4 FourthVar
;;;;
run;
proc sql; *Create a list of macro calls to the rename macro from joining dictionary.columns with your data. ;
* Dictionary.columns is like proc contents.;
select cats('%rename(var=',name,',newvar=',varname,')')
into :renamelist separated by ' '
from dictionary.columns C, my_vars M
where C.memname='HAVE' and C.libname='WORK'
and C.varnum=M.varnum;
quit;
proc datasets;
modify have;
&renamelist; *use the calls;
quit;
Another is to put/input the data using the input stream and the _INFILE_ automatic variable (that references the current line in the input stream). Here's an example. You would of course keep only the new variables if you wanted.
data have;
input x1-x4;
datalines;
1 2 3 4
5 6 7 8
;;;;
run;
data want;
set have;
infile datalines truncover; *or it will go to next line and EOF prematurely;
input #1 ##; *Reinitialize to the start of the line or it will eventually EOF early;
_infile_=catx(' ',of _all_); *put to input stream as space delimited - if your data has spaces you need something else;
input y1-y4 ##; *input as space delimited;
put _all_; *just checking our work, for debugging;
datalines; *dummy datalines (could use a dummy filename as well);
;;;;
run;
Here is another approach using the dictionary tables..
data have;
format var1-var4 $1.;
call missing (of _all_);
run;
proc sql noprint;
select name into: namelist separated by ' ' /* create macro var */
from dictionary.columns
where libname='WORK' and memname='HAVE' /* uppercase */
order by varnum; /* should be ordered by this anyway */
%macro create_rename(invar=);
%do x=1 %to %sysfunc(countw(&namelist,%str( )));
/* OLDVAR = NEWVARx */
%scan(&namelist,&x) = NEWVAR&x
%end;
%mend;
data want ;
set have (rename=(%create_rename(invar=&namelist)));
put _all_;
run;
gives:
NEWVAR1= NEWVAR2= NEWVAR3= NEWVAR4=

How to add variables based on counter and output to xml file using xmlengine(xmlmapper) in SAS?

I have two data sets in which some records are relevant to each other.
E.g.
Dataset1
Var1
abcde
bad man
big bang
strange
everyday
exactly
Dataset2
var1
abc
cde
bad
bad man a
stranger
Now I want to compare those records using a loop logic, and here is my code.
%let id1=%sysfunc(open(dataset2,in));
%let colterm=%sysfunc(varnum(&id1,var1));
%do %while(%sysfunc(fetch(&id1)) eq 0);
%let vterm=%sysfunc(getvarc(&id1,&colterm));
data dataset1;
set dataset1;
if index(strip(var1),strip("&vterm"))>0 or index(strip("&vterm"),strip(var1))>0 then do;/*when one contains the other*/
match="Fuzzy";
cnt=cnt+1;
end;
run;
%end;
proc sql noprint;
select max(cnt) into:maxnum/*to get max cnt*/
from dataset1;
quit;
Now dataset1 looks like below
Var1 cnt match
abcde 2 Fuzzy
bad man 2 Fuzzy
big bang 0
strange 1 Fuzzy
everyday 0
exactly 0
I want to merge those relevant records in dataset2 into dataset1, and the new dataset1 should look like below
Var1 cnt match FM_dataset2_1 FM_dataset2_2
abcde 2 Fuzzy abc cde
bad man 2 Fuzzy bad bad man a
big bang 0
strange 1 Fuzzy stranger
everyday 0
exactly 0
As you can see the new variables FM_dataset2_1 and FM_dataset2_2 are auto-reassigned ones based one counter, cnt. But I just couldn't think out a proper way of realizing this step using SAS code.
Further more, I need to output the dataset into an xml file. And the result should look like below
<text>abcde</text>
<match>Fuzzy</match>
<matchitem>abc</matchitem>
<matchitem>tecde</matchitem>
The problem, as with the issue above, is also about how to determine the number of matchitem element and write into the file. In xml map file, I can determine the position as follows
<COLUMN name="FM_dataset2_1">
<PATH syntax="XPath">/../matchitem[position()=**1**]</PATH>
...
<COLUMN name="FM_dataset2_2">
<PATH syntax="XPath">/../matchitem[position()=**2**]</PATH>
But this has to be done mannually case by case. Is it possible to customize map file based on cnt counter(maxnum) automatically?
Can anybody suggest?
I'm sure there is more efficient code than the following, but I tried to stay with your line of thought. I am not familiar with working with the XML engine, so I'll leave that part to someone else. Otherwise, if you need to create it manually then you were on the right track creating the MAXNUM macro variable, then you can use it in a loop.
%let id1=%sysfunc(open(dataset2,in));
%let colterm=%sysfunc(varnum(&id1,var1));
%do %while(%sysfunc(fetch(&id1)) eq 0);
%let vterm=%sysfunc(getvarc(&id1,&colterm));
data dataset1;
set dataset1;
format vterm $20.;
if match eq "Fuzzy" then output;
if index(strip(var1),strip("&vterm"))>0 or index(strip("&vterm"),strip(var1))>0 then do;
cnt=sum(cnt,1);
match="Fuzzy";
vterm = "&vterm";
output;
end;
else do;
cnt=sum(cnt,0);
output;
end;
run;
proc sort data=dataset1;
by var1 match vterm descending cnt;
proc sort data=dataset1 nodupkey;
by var1 match vterm;
run;
%end;
proc sql;
create table maxcnt as
select
var1,
match,
max(cnt) as cnt
from dataset1
group by 1,2
;
quit;
run;
proc transpose data=dataset1 out=dataset1(drop=FM_dataset2_0 _name_) prefix=FM_dataset2_;
by var1 match;
id cnt;
var vterm;
run;
data dataset1;
merge dataset1 maxcnt;
by var1 match;
run;
%let id2=%sysfunc(close(&id1)); /*closes out dataset2 in case you need it later */