Now the question I have is I have a bigger problem as I am getting "this range is repeated or overlapped"... To be specific my values of label are repeating I mean my format has repeated values like a=aa b=aa c=as kind of. How do I resolve this error. When I use the hlo=M as muntilqbel option it gives double the data...
I am mapping like below.
Santhan=Santhan
Chintu=Santhan
Please suggest a solution.
To convert data to a FORMAT use the CNTLIN= option on PROC FORMAT. But first make sure the data describes a valid format. So read the data from the file.
data myfmt ;
infile 'myfile.txt' dsd truncover ;
length fmtname $32 start $100 value $200 ;
fmtname = '$MYFMT';
input start value ;
run;
Make sure to set the lengths of START and VALUE to be long enough for any actual values your source file might have.
Then make sure it is sorted and you do not have duplicate codes (START values).
proc sort data=myfmt out=myfmt_clean nodupkey ;
by start;
run;
The SAS log will show if any observations were deleted because of duplicate START values.
If you do have duplicate values then examine the dataset or original text file to understand why and determine how you want to handle the duplicates. The PROC SORT step above will keep just one of the duplicates. You might just has exact duplicates, in which case keeping only one is fine. Or you might want to collapse the duplicate observations into a single observation and concatenate the multiple decodes into one long decode.
If you want you can add a record that will add the functionality of the OTHER keyword of the VALUE statement in PROC FORMAT. You can use that to set a default value, like 'Value not found', to decode any value you might encounter that was not in your original source file.
data myfmt_final;
set myfmt_clean end=eof;
output;
if eof then do;
start = ' ';
label = 'Value not found';
hlo = 'O' ;
output;
end;
run;
Then use PROC FORMAT to make the format from the cleaned up data file.
proc format cntlin = myfmt_final;
run;
To convert a FORMAT to a dataset use the CNTLOUT= option on PROC FORMAT.
For example if you had created this format previously.
proc format ;
value $myfmt 'ABC'='ABC' 'BCD'='BCD' 'BCD1'='BCD' 'BCD2'='BCD' ;
run;
then you can use another PROC FORMAT step to make a dataset. Use the SELECT statement if you format catalog has more than one format defined and you just want one (or some) of them.
proc format cntlout=myfmt ;
select $myfmt ;
run;
Then you can use that dataset to easily make a text file. For example a comma delimited file.
data _null_;
set myfmt ;
file 'myfmt.txt' dsd ;
put start label;
run;
The result would be a text file that looks like this:
ABC,ABC
BCD,BCD
BCD1,BCD
BCD2,BCD
You get this error because you have the same code that maps to two different categories. I'm going to guess you likely did not import your data correctly from your text file and ended up getting some values truncated but without the full process it's an educated guess.
This will work fine:
proc format;
value $ test
'a'='aa' 'b'='aa' 'c'='as'
;
run;
This version will not work, because a is mapped to two different values, so SAS will not know which one to use.
proc format;
value $ badtest
'a'='aa'
'a' = 'ba'
'b' = 'aa'
'c' = 'as';
run;
This generates the error regarding overlaps in your data.
The way to fix this is to find the duplicates and determine which code they should actually map to. PROC SORT can be used to get your duplicate records.
Related
This might sound awkward but I do have a requirement to be able to concatenate all the values of a char column from a dataset, into one single string. For example:
data person;
input attribute_name $ dept $;
datalines;
John Sales
Mary Acctng
skrill Bish
;
run;
Result : test_conct = "JohnMarySkrill"
The column could vary in number of rows in the input dataset.
So, I tried the code below but it errors out when the length of the combined string (samplkey) exceeds 32K in length.
DATA RECKEYS(KEEP=test_conct);
length samplkey $32767;
do until(eod);
SET person END=EOD;
if lengthn(attribute_name) > 0 then do;
test_conct = catt(test_conct, strip(attribute_name));
end;
end;
output; stop;
run;
Can anyone suggest a better way to do this, may be break down a column into chunks of 32k length macro vars?
Regards
It would very much help if you indicated what you're trying to do but a quick method is to use SQL
proc sql NOPRINT;
select name into :name_list separated by ""
from sashelp.class;
quit;
%put &name_list.;
As you've indicated macro variables do have a size limit (64k characters) in most installations now. Depending on what you're doing, a better method may be to build a macro that puts the entire list as needed into where ever it needs to go dynamically but you would need to explain the usage for anyone to suggest that option. This answers your question as posted.
Try this, using the VARCHAR() option. If you're on an older version of SAS this may not work.
data _null_;
set sashelp.class(keep = name) end=eof;
length long_var varchar(1000000);
length want $256.;
retain long_var;
long_var = catt(long_var, name);
if eof then do;
want = md5(long_var);
put want;
end;
run;
I am trying to export SAS data into CSV, sas dataset name is abc here and format is
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
I am using following code.
filename exprt "C:/abc.csv" encoding="utf-8";
proc export data=abc
outfile=exprt
dbms=tab;
run;
output is
LINE_NUMBER DESCRIPTION
524JG "24PC AMEFA VINTAGE CUTLERY SET ""DUBARRY"""
so there is double quote available before and after the description here and additional doble quote is coming after & before DUBARRY word. I have no clue whats happening. Can some one help me to resolve this and make me understand what exatly happening here.
expected result:
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
There is no need to use PROC EXPORT to create a delimited file. You can write it with a simple DATA step. If you want to create your example file then just do not use the DSD option on the FILE statement. But note that depending on the data you are writing that you could create a file that cannot be properly parsed because of extra un-protected delimiters. Also you will have trouble representing missing values.
Let's make a sample dataset we can use to test.
data have ;
input id value cvalue $ name $20. ;
cards;
1 123 A Normal
2 345 B Embedded|delimiter
3 678 C Embedded "quotes"
4 . D Missing value
5 901 . Missing cvalue
;
Essentially PROC EXPORT is writing the data using the DSD option. Like this:
data _null_;
set have ;
file 'myfile.txt' dsd dlm='09'x ;
put (_all_) (+0);
run;
Which will yield a file like this (with pipes replacing the tabs so you can see them).
1|123|A|Normal
2|345|B|"Embedded|delimiter"
3|678|C|"Embedded ""quotes"""
4||D|Missing value
5|901||Missing cvalue
If you just remove DSD option then you get a file like this instead.
1|123|A|Normal
2|345|B|Embedded|delimiter
3|678|C|Embedded "quotes"
4|.|D|Missing value
5|901| |Missing cvalue
Notice how the second line looks like it has 5 values instead of 4, making it impossible to know how to split it into 4 values. Also notice how the missing values have a minimum length of at least one character.
Another way would be to run a data step to convert the normal file that PROC EXPORT generates into the variant format that you want. This might also give you a place to add escape characters to protect special characters if your target format requires them.
data _null_;
infile normal dsd dlm='|' truncover ;
file abnormal dlm='|';
do i=1 to 4 ;
if i>1 then put '|' #;
input field :$32767. #;
field = tranwrd(field,'\','\\');
field = tranwrd(field,'|','\|');
len = lengthn(field);
put field $varying32767. len #;
end;
put;
run;
You could even make this datastep smart enough to count the number of fields on the first row and use that to control the loop so that you wouldn't have to hard code it.
PROC FORMAT;
VALUE $Gender 'M'='Male'
'F'='Female';
In the above I am passing the value for processing format as 'm'= 'male'
And 'f'='female' ... The same way I need that values to be passed from a file and the values of process format comes dynamically. How do I do that.
Like I need to pass the above mapping m=male, f=female from a file and read the file and pass that mapping to proceed format dynamically.
To do what you are asking, is done if you can place your data into Data set, two steps process
1- Get your raw data into data set
2- Use above data set to get desired format
so let do this-
*step 1-;
DATA fmt;
Infile "Textfile.txt" DSD ;
Retain fmtname '$myfmt'; /*myfmt is what your format name*/;
Length start $2 label $50;
Input start label ;
RUN;
Now since above code will create a dataset with Male female information use same dataset to create your format.
*Step2:
PROC FORMAT CNTLIN=fmt;
RUN;
The simplest way to create a format from a data set is to use the CNTLIN= option in PROC FORMAT.
REQUIRED VARIABLES IN THE FORMAT DATASET (FMTNAME, START, AND LABEL)
Variable Used for
FMTNAME- The format name
START - The left side of the formatting = sign (assumed character) –
must be unique unless defining a Multi-Label format
LABEL- The right side of the formatting = sign
I am trying to print out a delimited file, without having to specify all of the columns. I can get close, but the numeric columns are always quoted:
DATA _NULL_;
SET SASHELP.CARS (obs = 5 keep = Make Model EngineSize);
FILE "foo.csv" DSD DLM=",";
PUT (_all_) (~);
RUN;
foo.csv
"Acura","MDX","3.5"
"Acura","RSX Type S 2dr","2"
"Acura","TSX 4dr","2.4"
"Acura","TL 4dr","3.2"
"Acura","3.5 RL 4dr","3.5"
How can I achieve either:
"Acura","MDX",3.5
"Acura","RSX Type S 2dr",2
"Acura","TSX 4dr",2.4
"Acura","TL 4dr",3.2
"Acura","3.5 RL 4dr",3.5
or:
Acura,MDX,3.5
Acura,RSX Type S 2dr,2
Acura,TSX 4dr,2.4
Acura,TL 4dr,3.2
Acura,3.5 RL 4dr,3.5
~ asks for quoting. So, you're getting quoting.
You can use & instead:
DATA _NULL_;
SET SASHELP.CARS (obs = 5 keep = Make Model EngineSize);
FILE "c:\temp\foo.csv" DSD DLM=",";
PUT (_all_) (&);
RUN;
& has effectively no impact on the data (we've had a question about it once upon a time, I don't recall the ultimate answer, but basically it seems to mostly be used for this specific purpose, even though that's not its purpose).
I am submitting the following SAS code:
proc format;
picture mysdt
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
DATA _NULL_;
call symput("Today", Put(datetime(),mysdt.));
run;
%put t_&today;
The resulting log shows 2 spaces before the datetime:
t_ 201504240150
The problem here is when my macro is resolved it is creating leading space. Why is it creating spaces?
My output should be:
t_201504240150
I know the solution but just wanted to know the reason.
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt.)));
run;
The reason for this is that your format is set up with a default length of 14. Therefore, when you go to put your value into &today it is stored with leading blanks to fill out the length to 14. From the SAS documentation:
DEFAULT=length
specifies the default length of the picture. The value for DEFAULT= becomes the length of picture if you do not give a specific length when you associate the format with a variable.
So, there are a number of options:
Set the default format length to match the expected length of your datetime value (using DEFAULT=12):
proc format;
picture mysdt (default=12)
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
Specify the width in your format:
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt12.)));
run;
Or, as previously answered, use call symputx to trim whitespace:
DATA _NULL_;
call symputx("Today", strip(Put(datetime(),mysdt.)));
run;
Personally, I'd fix the format to default to 12, then you won't need to remember to specify width or use call symputx each time.
call symputx remove leading and tailing space.
DATA _NULL_;
call symputx("Today", Put(datetime(),mysdt.));
run;