How to use variables with a period/dot stop in their name in SAS? - sas

I am confused about how to read variables names with a period/dot. the process stop on when it reads variable with dot/period;
my csv file looks like this:
here is my code:
what the log indicated:
Some of friends suggest me to change variables name, I did that several times, but the result makes me more frustrated:

If you could try a simpler approach using proc import, it automatically converts dots(.) in a variable name into underscore
proc import datafile="X:\folder\sample.csv"
out=out_ds
dbms=csv
replace;
getnames=yes;
run;
data out_ds1;
set out_ds;
d_private=(private='Yes');
run;

Just fix your program to use valid names for your variables.
The names you use in your SAS code need to be valid SAS names. But they do not have to exactly match the column headers in the text file you are trying to read.
Valid names are from 1 to 32 characters. Start with either underscore or a letter and only include digits, letters or underscores. For example you could use Accept_pct as the variable name for the variable that SAS is showing as being an error.

Enable the extended variable names option. Enclose the variables that have spaces or special characters in quotes with an n, like 'this'n.
options validvarname=any;
data want;
input var1 'var.2'n '3rd var'n;
datalines;
1 2 3
;
run;

Related

Unable to convert a character variable with numbers with a comma into numeric

I have a set of variables in SAS that should be numeric but are characters. Numbers are comma separated and I need a point. For example, I need 19,000417537 to be 19.000417537. I tried translate without success. the comma is still there and I'm not able to convert the variable to numeric using input(). Can anyone help me please?
Thank you in advance
Best
Use INPUT() with the COMMAX informat.
data have;
length have $20.;
have = "19,000417537";
want = input(have, commax32.);
format want 32.8;
run;
proc print data=have;
run;
Obs have want
1 19,000417537 19.00041754
In two steps you can replace the , with . with tranwrd and then use input to convert it to numeric.
data yourdf;
set df;
charnum2=tranwrd(charnum, ",", "."); /*replace , with .*/
numvar = input(charnum2, 12.); /*convert to numeric*/
run;
You can use the COMMA informat to read strings with commas in them. But if you want it to treat the commas as decimal points instead of ignoring them then you probably need to use COMMAX instead (Or perhaps use the NLNUM informat instead so that the meaning of commas and periods in the text will depending on your LOCALE settings).
So if the current dataset is named HAVE and the text you want to convert is in the variable named STRING you can create a new dataset named WANT with a new numeric variable named NUMBER with code like this:
data want;
set have;
number = input(string,commax32.);
run;

Find Dot Separated Words in a String

I need to parse a log file to pick out strings that match the following case-insensitive pattern:
libname.data <--- Okay
libname.* <--- Not okay
For those with SAS experience, I'm trying to get SAS dataset names out of a large log.
All strings are space-separated. Some examples of lines:
NOTE: The data set LIBNAME.DATA has 428 observations and 15 variables.
MPRINT(MYMACRO): data libname.data;
MPRINT(MYMACRO): create table libname.data(rename=(var1 = var2)) as select distinct var1, var2 as
MPRINT(MYMACRO): format=date. from libname.data where ^missing(var1) and ^missing(var2) and
What I've tried
This PERL regular expression:
/^(?!.*[.*]{2})[a-z0-9*_:-]+(?:\.[a-z0-9;_:-]+)+$/mi
https://regex101.com/r/jYkXn5/1
In SAS code:
data test;
line = 'words and stuff libname.data';
test = prxmatch('/^(?!.*[.*]{2})[a-z0-9*_:-]+(?:\.[a-z0-9;_:-]+)+$/mi', line);
run;
Problem
This will work when the line only contains this exact string, but it will not work if the line contains other strings.
Solution
Thanks, Blindy!
The regex that worked for me to parse SAS datasets from a log is:
/(?!.*[.*]{3})[a-z_]+[a-z0-9_]+(?:\.[a-z0-9_]+)/mi
data test;
line = 'NOTE: COMPRESSING DATA SET LIBNAME.DATA DECREASED SIZE BY 46.44 PERCENT';
prxID = prxparse('/(?!.*[.*]{3})[a-z]+[a-z0-9_]+(?:\.[a-z0-9_]+)/mi');
call prxsubstr(prxID, line, position, length);
dataset = substr(line, position, length);
run;
This will still pick up some SQL select statements but that is easily solvable through post-processing.
You anchored your expression at the beginning, simply remove the first ^ and you're set.
/(?!.*[.*]{2})[a-z0-9*_:-]+(?:\.[a-z0-9;_:-]+)+$/mi
You can get by just locating the following landmark text in a log file line.
... data set <LIBNAME>.<MEMNAME> ...
If the data set name is in the log you can presume it was correctly formed.
data want;
length line $1000;
infile LOG_FILE lrecl=1000 length=L;
input line $VARYING. L;
* literally "data set <name>" followed by space or period;
rx = prxparse('/data set (.*?)\.(.*?)[. ]/');
if prxmatch(rx,line) then do;
length libname $8 memname $32;
libname = prxposn(rx,1,line);
memname = prxposn(rx,2,line);
line_number = _n_;
output;
end;
keep libname memname line_number;
run;
Some adjustment would be needed if the data set names are name literals of the form '<anything>'N
There are also a plethora of existing SAS Log file parsers and analyzers out on the web that you can utilize.
The lookahead at the start prevents matching .. but the pattern by itself will not match that, as the character classes are repeated 1 or more times and do not contain a dot.
If you don't want to match ** as well, and the string should not start with *, you can add that to a character class [*.] together with the dot, and take it out of the first character class.
In that case, you could omit the positive lookahead and the anchor:
/[a-z0-9_:-]+(?:[.*][a-z0-9_:-]+)+/i
Regex demo
As the pattern does not contain any anchors, you could omit the m flag.

SAS: Break up long string in code

I find it good practice to restrict my code to within 80 characters per line. Since SAS ignores white space, this usually isn't a problem. However, I occasionally need to refer to some string which is excessively long.
For example,
filename infile "B:\This\file\path\is\really\long\but\there\is\nothing\I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\filename.txt";
I can think of two solutions:
1) Insert a carriage return. This however makes the code look quite messy and may unwittingly introduce invisible characters (i.e \r\n) into the string.
filename infile "B:\This\file\path\is\really\long\but\there\is\nothing\
I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\
with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\
filename.txt";
2) Use macro variables to break the string into several parts.
%let part1 = B:\This\file\path\is\really\long\but\there\is\nothing\;
%let part2 = I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\;
%let part3 = with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\;
%let part4 = filename.txt;
filename infile "&part1.&part2.&part3.&part4.";
%let path = %sysfunc(pathname(infile));
%put &path;
Ideally, I would like something which allows me to follow the indentation scheme of the rest of the code.
filename infile "B:\This\file\path\is\really\long\but\there\is\nothing\
I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck\
with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies\
filename.txt";
A possible solution, at least within the context of this example, would be to bypass a declaration altogether and prompt the use for the input file. This does not appear easy to implement, however.
For this type of situation where the string needs to be used as one token then splitting it into separate macro variables is the best approach.
%let basedir=b:\Main Folder;
%let project=This project\has\many\parts;
%let fname=filename.txt ;
...
infile "&basedir/&project/&fname" ;
Note that SAS is happy to convert your directory delimiters between Unix (/) and Windows (\) style automatically for you.
You could also take advantage of using a fileref to point to a starting point in your directory tree.
filename basedir "&basedir";
...
infile basedir("&project/&fname");
You could also store the path in a text file or dataset and use that to generate the path into a macro variable.
data _null_;
infile 'parameter_file.txt' ;
input filename :$256. ;
call symputx('filename',filename);
run;
...
infile "&filename" ;
Another variation on using macro variable is to use multiple %LET statements to initialize a single macro variable. That way you can break the long string into multiple tokens.
%let fname=B:\This\file\path\is\really\long\but\there\is\nothing;
%let fname=&fname\I\can\do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck;
%let fname=&fname\with\whatever\organization\or\lack\thereof\exists\for\directory\hierarchies;
%let fname=&fname\filename.txt;
Or you could use a DATA step to set your macro variable instead.
data _null_;
call symputx('fname',catx('\'
,'B:\This\file\path\is\really\long\but\there\is\nothing\I\can'
,'do\about\it\because\it\is\on\a\shared\network\drive\and\I\am\stuck'
,'with\whatever\organization\or\lack\thereof\exists\for\directory'
,'hierarchies\filename.txt'
));
run;
For a situation where you need to put a long string in code such as a dataset label or some type of description consider using %cmpres. The function has limits but is useful to keep one inside 80 columns if they can use it. Here, my CR and other adjacent white spaces are being "compressed" in to a single space character.
%macro get_filename(FILEPATH_FILE, FILE)
/DES=%cmpres("returns a file's name, placed into var FILE, removing the
file path from FILEPATH_FILE.");
If you do this a lot, use %SYSFUNC() and COMPRESS() to make a user-defined macro like this:
%macro c(text);
%sysfunc(compress(&text, ,s))
%mend;
filename infile %c("B:\This\file\path\is\really\long\but\there\is\nothing\I\
can\do\about\it\because\it\is\on\a\shared\network\drive\
and\I\am\stuck\with\whatever\organization\or\lack\thereof\
exists\for\directory\hierarchies\and\he\uses\B\as\a\drive\
OMG\who\does\that\filename.txt");
%put %c("B:\This\file\path\is\really\long\but\there\is\nothing\I\
can\do\about\it\because\it\is\on\a\shared\network\drive\
and\I\am\stuck\with\whatever\organization\or\lack\thereof\
exists\for\directory\hierarchies\and\he\uses\B\as\a\drive\
OMG\who\does\that\filename.txt");
Option "s" in the COMPRESS() function removes all whitespace characters.
SAS posts notes on the log, you can ignore them:
NOTE: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation marks.

Convert string into numeric and change period to comma seperator sas

I have a string called weight that is 85.5
I would like to convert it into a numeric 85,5 and replace the decimal seperator with a comma using SAS.
So far I am using this (messy) two step approach
weight_num= (weight*1);
format weight_num COMMAX13.2;
How can this be achieved in a less clumpsy way??
Your sample code is the recommended method of changing a variable type.
Another way is transtrn function to replace the . with a comma. This is only a good method if you don't plan to do any calculations on the values.
data have;
set sashelp.class;
keep name weight:;
weight_char=put(weight, 8.1);
run;
data want;
set have;
weight_char=transtrn(weight_char, ".", ",");
run;
proc print data=want;
run;
If you just want to change it so that commas are used for decimal point instead of periods then why not just use a simple character substitution. Do you also want to change thousands separator from comma to period? TRANSLATE() is good for that.
weight = translate(weight,',.','.,');
If you want to convert it to a number then use the INPUT() function rather than forcing SAS to convert for you.
weight_num = input(weight,comma32.);
You can then attach whatever format you want to the new numeric variable.

Strip apostrophes from a character string (compress?)

I have a string which looks like this:
"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"
And I'd like it to look like this:
ABAR_VAL ACQ_EXPTAX_Y ACQ_EXP_TAX ADJ_MATHRES2
I.e. no apostrophes or commas and single space separated.
What is the cleanest / shortest way to do so in SAS 9.1.3?
Preferably something along the lines of:
call symput ('MyMacroVariable',compress(????,????,????))
Just to be clear, the result needs to be single space separated, devoid of punctuation, and contained in a macro variable.
Here you go..
data test;
var1='"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"';
run;
data test2;
set test;
call symput('macrovar',COMPBL( COMPRESS( var1,'",',) ) );
run;
%put &macrovar;
Is this part of an infile statement or are you indeed wanting to create macro variables that contain these values? If this is part of an infile statement you shouldn't need to do anything if you have the delimiter set properly.
infile foo DLM=',' ;
And yes, you can indeed use the compress function to remove specific characters from a character string, either in a data step or as part of a macro call.
COMPRESS(source<,characters-to-remove>)
Sample Data:
data temp;
input a $;
datalines;
"boo"
"123"
"abc"
;
run;
Resolve issue in a data step (rather than create a macro variable):
data temp2; set temp;
a=compress(a,'"');
run;
Resolve issue whilst generating a macro variable:
data _null_; set temp;
call symput('MyMacroVariable',compress(a,'"'));
run;
%put &MyMacroVariable.;
You'll have to loop through the observations in order to see the compressed values the variable for each record if you use the latter code. :)
To compress multiple blanks into one, use compbl : http://www.technion.ac.il/docs/sas/lgref/z0214211.htm