Strip apostrophes from a character string (compress?) - sas

I have a string which looks like this:
"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"
And I'd like it to look like this:
ABAR_VAL ACQ_EXPTAX_Y ACQ_EXP_TAX ADJ_MATHRES2
I.e. no apostrophes or commas and single space separated.
What is the cleanest / shortest way to do so in SAS 9.1.3?
Preferably something along the lines of:
call symput ('MyMacroVariable',compress(????,????,????))
Just to be clear, the result needs to be single space separated, devoid of punctuation, and contained in a macro variable.

Here you go..
data test;
var1='"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"';
run;
data test2;
set test;
call symput('macrovar',COMPBL( COMPRESS( var1,'",',) ) );
run;
%put &macrovar;

Is this part of an infile statement or are you indeed wanting to create macro variables that contain these values? If this is part of an infile statement you shouldn't need to do anything if you have the delimiter set properly.
infile foo DLM=',' ;
And yes, you can indeed use the compress function to remove specific characters from a character string, either in a data step or as part of a macro call.
COMPRESS(source<,characters-to-remove>)
Sample Data:
data temp;
input a $;
datalines;
"boo"
"123"
"abc"
;
run;
Resolve issue in a data step (rather than create a macro variable):
data temp2; set temp;
a=compress(a,'"');
run;
Resolve issue whilst generating a macro variable:
data _null_; set temp;
call symput('MyMacroVariable',compress(a,'"'));
run;
%put &MyMacroVariable.;
You'll have to loop through the observations in order to see the compressed values the variable for each record if you use the latter code. :)

To compress multiple blanks into one, use compbl : http://www.technion.ac.il/docs/sas/lgref/z0214211.htm

Related

Unable to convert a character variable with numbers with a comma into numeric

I have a set of variables in SAS that should be numeric but are characters. Numbers are comma separated and I need a point. For example, I need 19,000417537 to be 19.000417537. I tried translate without success. the comma is still there and I'm not able to convert the variable to numeric using input(). Can anyone help me please?
Thank you in advance
Best
Use INPUT() with the COMMAX informat.
data have;
length have $20.;
have = "19,000417537";
want = input(have, commax32.);
format want 32.8;
run;
proc print data=have;
run;
Obs have want
1 19,000417537 19.00041754
In two steps you can replace the , with . with tranwrd and then use input to convert it to numeric.
data yourdf;
set df;
charnum2=tranwrd(charnum, ",", "."); /*replace , with .*/
numvar = input(charnum2, 12.); /*convert to numeric*/
run;
You can use the COMMA informat to read strings with commas in them. But if you want it to treat the commas as decimal points instead of ignoring them then you probably need to use COMMAX instead (Or perhaps use the NLNUM informat instead so that the meaning of commas and periods in the text will depending on your LOCALE settings).
So if the current dataset is named HAVE and the text you want to convert is in the variable named STRING you can create a new dataset named WANT with a new numeric variable named NUMBER with code like this:
data want;
set have;
number = input(string,commax32.);
run;

Why is the last character getting removed after applying tranwrd function

I want to replace certain values in my json file (in this example null values with empty quotation marks.) My solution is working correctly but, for some mysterious reason, the last character of the json file is deleted. Regardless of the last character, the code always deletes it - I have also tried with a different json file that ends in curly braces.
What is causing this and more importantly how can I prevent this?
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'G:\test.json' pretty fmtnumeric nosastags keys;
export testdata_;
run;
data _null_;
infile 'G:\test.json';
file 'G:\test.json';
input;
_infile_ = tranwrd(_infile_,'null','""');
put _infile_ ;
run;
To see how the contents change, first run the code until "data null" statement and check the file content, then run the last statement.
Data _null_ has it correct; don't write to the same file. SAS offers this option, but in the modern day it's almost always the wrong answer, due to how SAS supports this and the fact that storage is sufficiently cheap and fast.
In this case, it looks like it's a relatively easy fix, but you probably should do as suggested and write to a new file anyway - there will be other issues.
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'H:\temp\test.json' pretty fmtnumeric nosastags keys;
export testdata_;
run;
data _null_;
infile 'H:\temp\test.json' end=eof;
file 'H:\temp\test.json';
input #;
putlog _infile_;
_infile_ = tranwrd(_infile_,'null','"" ');
len = length(_infile_);
put _infile_ ;
if eof then put _infile_;
run;
There's two changes. One, I use '"" ' instead of '""' in the tranwrd; that's because otherwise you end up with slightly odd results with new lines being added. If your JSON parser doesn't like "" ,, then you may want to instead have two tranwrd, one for null, and one for null, or something similar (or use a regular expression). But what's important is the number of characters needs to match in the input and the output. If you can't handle that (like the extra spaces are problematic) then you're left with "write a new file".
Two, I look for the end of the file, then intentionally write out a second line there. That avoids the issue you're having with the bracket, as it avoids having the EOF being written out before the bracket. I'm not 100% sure I know why you need that - but you do.
Another option, which might make more sense, is to only write the lines that have the bracket.
data _null_;
infile 'H:\temp\test.json' sharebuffers;
file 'H:\temp\test.json';
input #;
putlog _infile_;
if find(_infile_,'null') then do;
_infile_ = tranwrd(_infile_,'null','"" ');
put _infile_;
end;
run;
I added sharebuffers because that should make it run a bit faster. Note that I also remove one space - something weird about how SAS does this seems to otherwise remove a space from the following line otherwise. No idea why, probably something weird with EOL characters.
But again - don't do any of this unless there's no other option. Write a new file.
One strange thing is that the PROC JSON always writes a text file that uses LF as the end of line characters.
So you might be able to get your overwriting of the file to work if add these caveats:
Use TERMSTR=LF on the INFILE statement.
Use SHAREDBUFFERS on the INFILE statement.
Replace the string with the same number of bytes with the TRANWRD() function and not put a space as the last character on the line.
I would also search for ': null' instead of just 'null' to reduce risk of replacing those characters in some other string in the file.
data _null_;
infile json SHAREBUFFERS termstr=lf ;
file json ;
input ;
_infile_ = tranwrd(_infile_,': null',': ""');
put _infile_;
run;

SAS: Break up long string in code

I find it good practice to restrict my code to within 80 characters per line. Since SAS ignores white space, this usually isn't a problem. However, I occasionally need to refer to some string which is excessively long.
For example,
filename infile "B:\This\file\path\is\really\long\but\there\is\nothing\I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\filename.txt";
I can think of two solutions:
1) Insert a carriage return. This however makes the code look quite messy and may unwittingly introduce invisible characters (i.e \r\n) into the string.
filename infile "B:\This\file\path\is\really\long\but\there\is\nothing\
I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\
with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\
filename.txt";
2) Use macro variables to break the string into several parts.
%let part1 = B:\This\file\path\is\really\long\but\there\is\nothing\;
%let part2 = I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\;
%let part3 = with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\;
%let part4 = filename.txt;
filename infile "&part1.&part2.&part3.&part4.";
%let path = %sysfunc(pathname(infile));
%put &path;
Ideally, I would like something which allows me to follow the indentation scheme of the rest of the code.
filename infile "B:\This\file\path\is\really\long\but\there\is\nothing\
I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\
with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\
filename.txt";
A possible solution, at least within the context of this example, would be to bypass a declaration altogether and prompt the use for the input file. This does not appear easy to implement, however.
For this type of situation where the string needs to be used as one token then splitting it into separate macro variables is the best approach.
%let basedir=b:\Main Folder;
%let project=This project\has\many\parts;
%let fname=filename.txt ;
...
infile "&basedir/&project/&fname" ;
Note that SAS is happy to convert your directory delimiters between Unix (/) and Windows (\) style automatically for you.
You could also take advantage of using a fileref to point to a starting point in your directory tree.
filename basedir "&basedir";
...
infile basedir("&project/&fname");
You could also store the path in a text file or dataset and use that to generate the path into a macro variable.
data _null_;
infile 'parameter_file.txt' ;
input filename :$256. ;
call symputx('filename',filename);
run;
...
infile "&filename" ;
Another variation on using macro variable is to use multiple %LET statements to initialize a single macro variable. That way you can break the long string into multiple tokens.
%let fname=B:\This\file\path\is\really\long\but\there\is\nothing;
%let fname=&fname\I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck;
%let fname=&fname\with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies;
%let fname=&fname\filename.txt;
Or you could use a DATA step to set your macro variable instead.
data _null_;
call symputx('fname',catx('\'
,'B:\This\file\path\is\really\long\but\there\is\nothing\I\can'
,'do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck'
,'with\whatever\organization\or\lack\thereof\exists\for\directory'
,'hierarchies\filename.txt'
));
run;
For a situation where you need to put a long string in code such as a dataset label or some type of description consider using %cmpres. The function has limits but is useful to keep one inside 80 columns if they can use it. Here, my CR and other adjacent white spaces are being "compressed" in to a single space character.
%macro get_filename(FILEPATH_FILE, FILE)
/DES=%cmpres("returns a file's name, placed into var FILE, removing the
file path from FILEPATH_FILE.");
If you do this a lot, use %SYSFUNC() and COMPRESS() to make a user-defined macro like this:
%macro c(text);
%sysfunc(compress(&text, ,s))
%mend;
filename infile %c("B:\This\file\path\is\really\long\but\there\is\nothing\I\
can\do\about\it\because\it\is\on\a\shared\network\drive\
and\I\am\stuck\with\whatever\organization\or\lack\thereof\
exists\for\directory\hierarchies\and\he\uses\B\as\a\drive\
OMG\who\does\that\filename.txt");
%put %c("B:\This\file\path\is\really\long\but\there\is\nothing\I\
can\do\about\it\because\it\is\on\a\shared\network\drive\
and\I\am\stuck\with\whatever\organization\or\lack\thereof\
exists\for\directory\hierarchies\and\he\uses\B\as\a\drive\
OMG\who\does\that\filename.txt");
Option "s" in the COMPRESS() function removes all whitespace characters.
SAS posts notes on the log, you can ignore them:
NOTE: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation marks.

Convert string into numeric and change period to comma seperator sas

I have a string called weight that is 85.5
I would like to convert it into a numeric 85,5 and replace the decimal seperator with a comma using SAS.
So far I am using this (messy) two step approach
weight_num= (weight*1);
format weight_num COMMAX13.2;
How can this be achieved in a less clumpsy way??
Your sample code is the recommended method of changing a variable type.
Another way is transtrn function to replace the . with a comma. This is only a good method if you don't plan to do any calculations on the values.
data have;
set sashelp.class;
keep name weight:;
weight_char=put(weight, 8.1);
run;
data want;
set have;
weight_char=transtrn(weight_char, ".", ",");
run;
proc print data=want;
run;
If you just want to change it so that commas are used for decimal point instead of periods then why not just use a simple character substitution. Do you also want to change thousands separator from comma to period? TRANSLATE() is good for that.
weight = translate(weight,',.','.,');
If you want to convert it to a number then use the INPUT() function rather than forcing SAS to convert for you.
weight_num = input(weight,comma32.);
You can then attach whatever format you want to the new numeric variable.

SAS: How to delete word between two specific position?

data:
Hell_TRIAL21_o World
Good Mor_Trial9_ning
How do I remove the _TRIAL21_ and _TRIAL9_?
What I did was I find the position of the first _ and the second _. Then I want to compress from the first _ and second _. But the compress function is not available to do so. How?
x = index(string, '_');
if (x>0) then do;
y = x+1;
z = find(string, '_', y);
end;
Text= " Hell_TRIAL21_o World Good Mor_Trial9_ning"
var= catx("",scan(text,1,"_"),"__",scan(text,3,"_"),"_", scan(text,5,"_"))
Note that the length of variable var may not be desirable to your case.Remember to adjust accordingly.
PERL regular expressions are a good way of identifying these sort of strings. call prxchange is the function that will remove the relevant characters. It requires prxparse beforehand to create the search and replace parameters.
I've used modify here to amend the existing dataset, obviously you may want to use set to write out to a new dataset and test the results first.
data have;
input string $ 30.;
datalines;
Hell_TRIAL21_o World
Good Mor_Trial9_ning
;
run;
data have;
modify have;
regex = prxparse('s/_.*_//'); /* identify and remove anything between 2 underscores */
call prxchange(regex,-1,string);
run;
Or to create a new variable and dataset, just use prxchange (which doesn't require prxparse).
data want;
set have;
new_string = prxchange('s/_.*_//',-1,string);
run;