Replacing consecutive embedded blanks with another character in SAS - replace

I'm trying to replace embedded spaces in one of my variables (QPR) with a new character. Here is my (abbreviated) code:
data sas2;
input QPR $ & 1-9;
QPR=tranwrd(strip(QPR)," ","0");
run;
proc print data=sas2;
run;
The tranwrd function seems to work for observations with one embedded blank; however, it does not work when there are two blanks in a row.
For example, 234 2345 becomes 23402345, but 234 345 becomes 234 (i.e., The rest gets cut off, I assume because of strip). Instead, I want 23400345.
I also tried tranwrd without the strip function, but I go from 234 345 to 23400000 instead. Translate does the same thing.
Any ideas on why this won't work and how to fix it? Alternatively, are there easier/better ways to do this in the data step?

The "&" symbol in your input statement causes SAS to stop reading the data after two spaces. After SAS stops reading the data, it pads the rest of the string with spaces up to a total length of 9 chars. This is why you had a bunch of zeros at the end of the string when you didn't use strip. Removing the "&" should fix it.

Related

Is there any function in SAS where we can read the exact value from the variable

Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;

Use of : when reading multiple records in SAS

I am studying SAS programming and there is one thing that is puzzling me. I tried to look up what colons (:) do in the text book I am using but I could not find anything.
The following program was one of the questions, and with the colon the program does read the instream data but without the colons it reads funny.
I am suspecting that the length of ABRAMS is less than 12 and that is why it reads it inappropriately, but with the colon for some reason it recognizes is fine.
I appreciate your help.
data a;
input #1 Lname $ Fname $ /
Department : $12. Salary : comma.10;
cards;
ABRAMS THOMAS
SALES $25,209.03
;
run;
proc print;
run;
Have a look at the documentation for the input statement. There is admittedly quite a lot of it, so here's a link to the specific page that deals with this:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000144370.htm
Relevant quote:
:
enables you to specify an informat that the INPUT statement uses to
read the variable value. For a character variable, this format
modifier reads the value from the next non-blank column until the
pointer reaches the next blank column, the defined length of the
variable, or the end of the data line, whichever comes first. For a
numeric variable, this format modifier reads the value from the next
non-blank column until the pointer reaches the next blank column or
the end of the data line, whichever comes first.

SAS while reading varbinary data from Amazon RDS is appending spaces at the end of the data. Can we avoid it?

SAS while reading varbinary data from Amazon RDS is appending spaces at the end of the data.
proc sql;
select emailaddr from tablename1;
quit;
The column emailaddr is varbinary(20)
For example:
I inserted "XX#WWW.com ", but while reading from db, it is appending spaces equal to the length of the column.
Since the column length is 20 it is returning "XX#WWW.com " ( note the spaces appended. I cannot use the trim() function since this also removes spaces that might genuinely be part of the original inserted data.
How can i stop sas from appending these spaces?
For my program i need to get the exact data as present in database without any extra spaces attached.
That's how SAS works; SAS has only CHAR equivalent datatype (in base SAS, anyway, DS2 is different), no VARCHAR concept. Whatever the length of the column is (20 here) it will have 20 total characters with spaces at the end to pad to 20.
Most of the time, it doesn't matter; when SAS inserts into another RDBMS for example it will typically treat trailing spaces as nonexistent (so they won't be inserted). You can use TRIM and similar to deal with the spaces if you're using regular expressions or concatenation to work with these values; CATS and similar functions perform concatenation-with-trimming.
If trailing spaces are part of your data, you are mostly out of luck in SAS. SAS considers trailing spaces irrelevant (equivalent to null characters). You can append a non-space character in SQL, or translate the spaces to NBSPs ('A0'x) or something else, while still in SQL, or use quotes or something around your actual values - but whatever you do will be complicated.

input an array in SAS

I need to read multiple raw text files into a SAS-dataset. Each file consists several ingredients as shown in the example files below. Each file (a dish) lists all the ingredients on one line, separated by a comma. The amount of ingredients is variable. Some example files (dishes):
Example file 1 (dish1.csv):
Tomate, Cheese, Ham, Bread
Example file 2 (dish2.csv):
Sugar, Apple
Example file 3 (dish3.csv):
Milk, Sugar, Cacao
Because I have about 250 files (dishes) I created a macro program to read those files. That way I can execute this macro in another macro to read all the dishes I need. The program looks like this:
%readDish (dishNumber);
data newDish;
* Find and read the csv-file;
infile "my_file_location/dish&dishNumber..csv" dlm=";" missover;
* Read up to 25 ingredients;
input ingredient1-ingredient25 : $25.;
* Put all ingredients in an array;
array ingredients{25} ingredient1-ingredient25;
* Loop thrue all the ingredients and output;
do i=1 to dim(ingredients);
dishNumber = &dishNumber;
ingredient = ingredients{i};
output;
end;
run;
%mend;
Is it possible to create a SAS (macro) program that is able to read all dishes, no matter how many ingredients I have? The SAS table should look like this:
1 Tomate
1 Cheese
1 Ham
1 Bread
Seems straightforward to me: read the data in vertically, then if you need it horizontal, add a transpose step afterwards. You don't have to read in a whole line in one step - the ## operator tells SAS to keep the line pointer on that line, so you just read in the one.
data dishes;
length _file $1024
ingredient $128;
infile "c:\temp\dish*.csv" dlm=',' filename=_file lrecl=32767; *or whatever your LRECL needs to be;
input ingredient $ ##;
dishnumber = input(compress(scan(_file,-2,'\.'),,'kd'),12.);
output;
run;
Here I use a wildcard to read them all in - you can of course us a macro with similar code if you need to, though wildcard or a concatenated filename is probably easier. The way I get dishnumber might not always work depending on the filename construction, but some form of that should be usable.
To expand on why this works: The way the datastep works in SAS is that it is a constant loop, looping over the code repeatedly until it encounters an "end condition". End conditions are, most commonly, the stop keyword, and then any attempt to read from a SET or INFILE where no further read is possible (i.e., you read a 100 line SAS dataset, and it tries to read row 101 in, fails, so ends the data step). However, other than that, it will keep doing the same code until it gets there. It just does some cleanup at the "run" point to make sure it is not infinitely looping.
In the case of input from infiles, usually SAS reads a line, then at the RUN, it will skip forward to the next EOL (end of line, usually a carriage return and linefeed in Windows) if it's not already at one. Sometimes that is useful - perhaps, usually. But, in some cases you'd rather ask SAS to keep reading the same line.
In comes the ## operator. ## says "do not advance to EOL even if you hit RUN". (# says "Do not advance to EOL except when you hit RUN" - normally input itself causes SAS to read until EOL.) Thus, when you perform the next data step iteration, the input pointer will be in the same exact place you left it - right after the previous field you read in.
This was highly useful in the 60s and 70s, when punchcards were the trendy new thing, and you would put lines of input often without regard to any line organization - in particular, if you input just one variable per row, at 8 columns per input variable, you're not wasting 72 blocks from one punchcard - so, you have input just like your ingredients: many pieces of data per row on the input, which then want to be translated into one piece of data per row in memory. While it's not as common nowadays to store data this way, this is certainly possible - as your data exemplify.

SAS: Where statement not working with string value

I'm trying to use PROC FREQ on a subset of my data called dataname. I would like it to include all rows where varname doesn't equal "A.Never Used". I have the following code:
proc freq data=dataname(where=(varname NE 'A.Never Used'));
run;
I thought there might be a problem with trailing or leading blanks so I also tried:
proc freq data=dataname(where=(strip(varname) NE 'A.Never Used'));
run;
My guess is for some reason my string values are not "A.Never Used" but whenever I print the data this is the value I see.
This is a common issue in dealing with string data (and a good reason not to!). You should consider the source of your data - did it come from web forms? Then it probably contains nonbreaking spaces ('A0'x) instead of regular spaces ('20'x). Did it come from a unicode environment (say, Japanese characters are legal)? Then you may have transcoding issues.
A few options that work for a large majority of these problems:
Compress out everything but alphabet characters. where=(compress(varname,,'ka') ne 'ANeverUsed') for example. 'ka' means 'keep only' and 'alphabet characters'.
UPCASE or LOWCASE to ensure you're not running into case issues.
Use put varname HEX.; in a data step to look at the underlying characters. Each two hex characters is one alphabet character. 20 is space (which strip would remove). Sort by varname before doing this so that you can easily see the rows that you think should have this value next to each other - what is the difference? Probably some special character, or multibyte characters, or who knows what, but it should be apparent here.