SAS Compress function Returns FALSE value - sas

I'm having some trouble using the 'Compress' function in SAS. My aim is to remove any alphabetical characters from the 'Comments' field in this data step.
However, when I execute the code below, the 'NewPrice' field shows FALSE rather than the expected value.
data WORK.basefile2;
length TempFileName Filenameused $300.;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile CSV2 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=3 Filename=TempFileName;
Filenameused= TRANWRD(substr(TempFileName, 48), ".xlsx.csv", "");
informat customer_id $8.;
informat Name $50.;
informat Reco_issue $50.;
informat Reco_action $50.;
informat ICD $50.;
informat Start_date $50.;
format customer_id $8.;
format Name $50.;
format Reco_issue $50.;
format Reco_action $50.;
format ICD $50.;
format Start_date $50.;
format Comments $255.;
format Description $255.;
NewPrice=COMPRESS(Comments, '', 'kd');
input
customer_id $
Name $
'Total Spend'n $
Template $
Product_id $
Description $
Start_date $
CD_Sales
CD_Lines
CD_Level $
CD_Price $
CD_uom $
CD_Discount $
Reco_issue $
Reco_action $
Reco_price $
Reco_discount $
Impact_£
Impact
Agree $
Comments $
ICD $
Structure_Code $
Deal_type $
NewPrice;
run;
Output when code is executed
Sample data (comma delimited):
ASDFGH,TEST,"£31,333.00",15AH,156907,TEST,18/10/2016,"£4,003.10",222,5,£5.19,M,,Below Hard Floor,Change Rate,£6.63,,£0.48,21.72%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,475266,TEST,11/11/2016,£49.61,29,5,£2.52,EA,,At Hard Floor,Change Rate,£6.36,,£1.28,60.38%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,404740,TEST,21/09/2017,£38.69,1,5,£116.07,EA,,Below Hard Floor,Change Rate,£163.80,,£15.91,29.14%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,476557,TEST,11/11/2016,£32.13,25,5,£1.32,EA,,Below Hard Floor,Change Rate,£2.76,,£0.48,52.17%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,476553,TEST,11/11/2016,£29.17,11,5,£1.29,EA,,Below Hard Floor,Change Rate,£3.39,,£0.70,61.95%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,476557,TEST,11/11/2016,£17.61,5,5,£3.96,EA,,Below Hard Floor,Change Rate,£9.69,,£1.91,59.13%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,475261,TEST,11/11/2016,£16.70,4,5,£10.92,EA,,Below Hard Floor,Change Rate,£26.67,,£5.25,59.06%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,476546,TEST,11/11/2016,£15.73,10,5,£0.96,EA,,Below Hard Floor,Change Rate,£2.67,,£0.57,64.04%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,476549,TEST,11/11/2016,£5.84,3,5,£1.86,EA,,At Hard Floor,Change Rate,£6.00,,£1.38,69.00%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,477340,TEST,11/11/2016,£3.75,2,5,£4.11,EA,,Below Hard Floor,Change Rate,£11.40,,£2.43,63.95%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",,259738,TEST,13/01/2018,"£45,173.66",403,5,£10.35,EA,20,Below Hard Floor,Change Rate,£10.80,,£0.15,4.17%,N,New Prices Agreed £3.52
ASDFGH,TEST,"£31,333.00",,297622,TEST,13/01/2018,£736.60,5,5,£10.95,EA,20,Below Hard Floor,Change Rate,£11.46,,£0.17,4.45%,N,New Prices Agreed £3.75
ASDFGH,TEST,"£31,333.00",,105384,TEST,19/07/2017,£223.44,1,5,£11.25,BG,42.5,Below Hard Floor,Change Rate,£11.49,,£0.08,2.09%,N,New Prices Agreed £3.76
Any help would be greatly appreciated!
Thanks,
Henry

First off, I assume you mean that NEWPRICE=' ', not FALSE, since SAS doesn't have a "FALSE" per se. ' ' would be treated as FALSE in a boolean expression, though.
COMPRESS is returning ' ' here because the value in COMMENTS has entirely letter characters. Your COMPRESS arguments are asking it to keep digits and spaces ('' is still a single space, even if it doesn't look like it - leave the argument empty entirely if you don't want space ), meaning it would only keep spaces and digits. You have no digits in the COMMENTS field for most records, so you have only spaces, which are treated as equivalent to missing by SAS.
The other records, you have the problem that you're keeping spaces, so it's not going to turn into a number neatly. You'll want to use input, and probably not keep spaces, to get the value you want. (You also probably do want to keep ., no?) Or make NEWPRICE a character field.
Finally, your line :
NewPrice=COMPRESS(Comments, '', 'kd');
is before the input statement, which is a problem - the value for comments isn't defined yet when it runs.
This works, for example. Note I don't understand why you have NewPrice listed in input (and some other fields that won't have definitions either)...
data WORK.basefile2;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile datalines delimiter = ',' MISSOVER DSD ;
informat customer_id $8.;
informat Name $50.;
informat Reco_issue $50.;
informat Reco_action $50.;
informat ICD $50.;
informat Start_date $50.;
format customer_id $8.;
format Name $50.;
format Reco_issue $50.;
format Reco_action $50.;
format ICD $50.;
format Start_date $50.;
format Comments $255.;
format Description $255.;
input
customer_id $
Name $
'Total Spend'n $
Template $
Product_id $
Description $
Start_date $
CD_Sales
CD_Lines
CD_Level $
CD_Price $
CD_uom $
CD_Discount $
Reco_issue $
Reco_action $
Reco_price $
Reco_discount $
Impact_Pound
Impact
Agree $
Comments $
ICD $
Structure_Code $
Deal_type $
;
NewPrice=input(COMPRESS(Comments, '.', 'kd'),best32.);
datalines;
ASDFGH,TEST,"£31,333.00",15AH,156907,TEST,18/10/2016,"£4,003.10",222,5,£5.19,M,,Below Hard Floor,Change Rate,£6.63,,£0.48,21.72%,N,New Prices Agreed £3.76
ASDFGH,TEST,"£31,333.00",15AH,475266,TEST,11/11/2016,£49.61,29,5,£2.52,EA,,At Hard Floor,Change Rate,£6.36,,£1.28,60.38%,N,In negotiations with the customer for new prices
ASDFGH,TEST,"£31,333.00",15AH,404740,TEST,21/09/2017,£38.69,1,5,£116.07,EA,,Below Hard Floor,Change Rate,£163.80,,£15.91,29.14%,N,In negotiations with the customer for new prices
;;;;
run;

Related

SAS Reading CSV file using input statement

I am using this code to import csv file in sas
data retail;
infile "C:\users\Documents\training\Retails_csv" DSD MISSOVER FIRST OBS =2;
INPUT Supplier :$32. Item_Category :$32. Month :$3. Cost :DOLLAR10. Revenue :DOLLAR10. Unit_Price :DOLLAR10.2 Units_Availed :8. Units_Sold :8.;run;
I need to get the Cost Revenue and Unit price in $ formatThe Output sas data
my dataset is I need the same Cost, revenue, Unit_PRICE IN DOLLAR FORMAT
please someone help
thanks
The INFILE statement does not take the REPLACE keyword. In fact since INFILE is just saying where the input data is coming from there isn't really any logical feature that the INFILE statement might have that would use that name.
You need to attach the DOLLAR format to the variables if you want SAS to print the values using dollar signs and thousands separators. You can either attach the format in the data step or in the steps that print the data.
format cost revenue unit_price dollar10. ;

SAS colon format modifier

What do the numbers in the grey box represent? And what's a simple way of understanding how the colon modifier affects the way sas reads in values?
The answer depends on information not provided. The answer B is the best choice in the sense that you should use the colon modifier when using informats in the INPUT statement to prevent the use of the formatted input mode instead of list input mode. Otherwise the formatted input could read too many or too few characters and also might leave the cursor in the wrong place for reading the next field.
But if you try to read that data from in-line cards it works fine for those two lines. That is because in-line data lines are padded to next multiple of 80 bytes.
If you put those lines into a file without any trailing spaces on the lines then the second line fails because there are not 10 characters to read for the last field. But if you add the TRUNCOVER option (or PAD) to the INFILE statement then it will work.
Try it yourself. TEST1 and TEST3 work. TEST2 gets a LOST CARD note.
data test1;
input name $ hired date9. age state $ salary comma10.;
format hired date9.;
cards;
Donny 5MAR2008 25 FL $43,123.50
Margaret 20FEB2008 43 NC 65,150
;
options parmcards=test;
filename test temp ;
parmcards;
Donny 5MAR2008 25 FL $43,123.50
Margaret 20FEB2008 43 NC 65,150
;
data test2;
infile test;
input name $ hired date9. age state $ salary comma10.;
format hired date9.;
run;
data test3;
infile test truncover;
input name $ hired date9. age state $ salary comma10.;
format hired date9.;
run;
With different data the first formatted input can cause trouble also. For example if the date values used only 2 digits for the year it would throw things off. So it tries to read FL as the age and then reads the first 8 characters of the salary as the STATE and just blanks as the SALARY.
data test1;
input name $ hired date9. age state $ salary comma10.;
format hired date9.;
cards;
Donny 5MAR08 25 FL $43,123.50
Margaret 20FEB2008 43 NC 65,150
;
Results:
Obs name hired age state salary
1 Donny 05MAR2008 . $43,123. .
2 Margaret 20FEB2008 43 NC 65150

reading large csv file that contains a column with large text values

i am having an issue with a large data set when reading it into sas that contains a review_text column with large text values. The first column review_id is listing the observation rather than the actual id. The other columns have wrong values and are being displaced within other variables.
DATA review;
INFORMAT review_id $50. ;
INFORMAT review_text $5000. ;
INFORMAT business_name $100. ;
INFORMAT business_id $100. ;
INFORMAT review_date mmddyy10. ;
INFORMAT city $25. ;
INFORMAT state $20. ;
INFORMAT address $250. ;
INFORMAT user_id $100. ;
INFORMAT user_name $100. ;
INFORMAT friends $500. ;
INFORMAT yelping_since mmddyy10. ;
INFORMAT categories $100. ;
INFILE 'C:\users\scott\desktop\yelp_food_reviews.csv' DELIMITER= ',' dsd LRECL=32767 FIRSTOBS=2;
INPUT review_id $ review_text $ business_name $ business_id $ review_date review_ratingbusiness_rating num_biz_reviews city $ state $ address $ postal_code 8. latitude 8.10 longitude 8.10 mon_hours $ tues_hours $ wed_hours $ thurs_hours $ fri_hours $ sat_hours $ sun_hours $ user_id $ user_name $ user_reviews_given 8. ave_rating_given 4.1 friends $ yelping_since categories $ is_open 1.;
run;
enter image description here

SAS: Taking Date data in DD-MMM-YYYY format from a csv file in a date format in a permanent data set

I would like to import data from a csv file in a permanent data set which has this date column with data format like "dd-mmm-yyyy" like "22-FEB-1990". I want this to be imported as date format inside the data set too. I have tried many format informats but i am not getting anything in the column.
Here is the code i wrote(While I commented out certain things I have tested all the permutations and combinations with the formats and informats i could think of):
libname asgn1 "C:\Users\*****\abc";
data asgn1.Car_sales_1_1;
infile "C:\Users\********\Car_sales.csv" dsd dlm="," FIRSTOBS=2 ;
input Manufacturer $ Model $ Fuel_efficiency Latest_Launch;
* format Latest_Launch mmddyy10.;
* informat Latest_Launch mmddyy10.;
run;
Please help...
Change your informat to date11. (dd-mmm-yyyy).
SAS Informats by Category > http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htm
I tried the following code and I got just the result I wanted....Thanks #Chris J
libname asgn1 "C:\Users\*****\abc";
data asgn1.Car_sales_1_1;
infile "C:\Users\********\Car_sales.csv" dsd dlm="," FIRSTOBS=2 ;
input Manufacturer $ Model $ Fuel_efficiency Latest_Launch;
informat Latest_Launch date11.;
format Latest_Launch ddmmyy10.;
run;

Reconstitute .txt file of HTML table as Dataset in SAS

I am currently using SAS version 9 to try and read in a flat file in .txt format of a HTML table that I have taken from the following page (entitled Wayne Rooney's Match History):
http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney
I've got the data into a .txt file using a Python webscraper using Scrapy. The format of my .txt file is like thus:
17-08-2013,1 : 4,Swansea,Manchester United,28',7.26,Assist Assist,26-08-2013,0 : 0,Manchester United,Chelsea,90',7.03,None,14-09-2013,2 : 0,Manchester United,Crystal Palace,90',8.44,Man of the Match Goal,17-09-2013,4 : 2,Manchester United,Bayer Leverkusen,84',9.18,Goal Goal Assist,22-09-2013,4 : 1,Manchester City,Manchester United,90',7.17,Goal Yellow Card,25-09-2013,1 : 0,Manchester United,Liverpool,90',None,Man of the Match Assist,28-09-2013,1 : 2,Manchester United,West Bromwich Albion,90'...
...and so on. What I want is a dataset that has the same format as the original table. I know my way round SAS fairly well, but tend not to use infile statements all that much. I have tried a few variations on a theme, but this syntax has got me the nearest to what I want so far:
filename myfile "C:\Python27\Football Data\test.txt";
data test;
length date $10.
score $6.
home_team $40.
away_team $40.
mins_played $3.
rating $4.
incidents $40.;
infile myfile DSD;
input date $
score $
home_team $
away_team $
mins_played $
rating $
incidents $ ;
run;
This returns a dataset with only the first row of the table included. I have tried using fixed widths and pointers to set the dataset dimensions, but because the length of things like team names can change so much, this is causing the data to be reassembled from the flat file incorrectly.
I think I'm most of the way there, but can't quite crack the last bit. If anyone knows the exact syntax I need that would be great.
Thanks
I would read it straight from the web. Something like this; this works about 50% but took a whole ten minutes to write, i'm sure it could be easily improved.
Basic approach is you use #'string' to read in text following a string. You might be better off reading this in as a bytestream and doing a regular expression match on <tr> ... </tr> and then parsing that rather than taking the sort of more brute force method here.
filename rooney url "http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney" lrecl=32767;
data rooney;
infile rooney scanover;
retain are_reading;
input #;
if find(_infile_,'<table id="player-fixture" class="grid fixture">')
then are_reading=1;
if find(_infile_,'</table>') then are_reading=0;
if are_reading then do;
input #'<td class="date">' date ddmmyy10.
#'class="team-link">' home_team $20.
#'class="result-1 rc">' score $10.
#'class="team-link">' away_team $20.
#'title="Minutes played in this match">' mins_played $10.
#'title="Rating in this match">' rating $6.
;
output;
end;
run;
As far as reading the scrapy output, you should change at least two things:
Add the delimiter. Not truly necessary, but I'd consider the code incorrect without it, unless delimiter is space.
Add a trailing "##" to get SAS to hold the line pointer, since you don't have line feeds in your data.
data want;
infile myfile flowover dlm=',' dsd lrecl=32767;
length date $10.
score $6.
home_team $40.
away_team $40.
mins_played $3.
rating $4.
incidents $40.;
input date $
score $
home_team $
away_team $
mins_played $
rating $
incidents $ ##;
run;
Flowover is default, but I like to include it to make it clear.
You also probably want to input the date as a date value (not char), so informat date ddmmyy10.;. The rating is also easily input as numeric if you want to, and both mins played and score could be input as numeric if you're doing analysis on those by adding ' and : to the delimiter list.
Finally, your . on length is incorrect; SAS is nice enough to ignore it, but . is only placed like so for formats.
Here's my final code:
data want;
infile "c:\temp\test2.txt" flowover dlm="',:" lrecl=32767;
informat date ddmmyy10.
score_1 score_2 2.
home_team $40.
away_team $40.
mins_played 3.
rating 4.2
incidents $40.;
input date
score_1
score_2
home_team $
away_team $
mins_played
rating ??
incidents $ ##;
run;
I remove the dsd as that's incompatible with the ' delimiter; if DSD is actually needed then you can add it back, remove that delimiter, and read minutes in as char. I add ?? for rating as it sometimes is "None" so ?? ignores the warnings about that.