Solution to work disk space not enough in sas - sas

I have more than 50 tables running in work. Before, it worked well.
But recently, there are some errors like:
ERROR: An I/O error has occurred on file
WORK.'SASTMP-000000030'n.UTILITY. ERROR: File
WORK.'SASTMP-000000030'n.UTILITY is damaged. I/O processing did not
complete. NOTE: Error was encountered during utility-file processing.
You may be able to execute the SQL statement successfully if you
allocate more space to the WORK library. ERROR: There is not enough WORK disk space to store the results of an internal sorting
phase. ERROR: An error has occurred.
Does anyone know how to solve this error?

Your disk is full. If this is running on a server, ask your system administrator to investigate the problem.
If this is your desktop, find and delete un-needed files to free up space.
Clean out old SAS Work Folders
Often, old SAS Work folders do not get cleared when SAS closes. You can get back a lot of disk space by going to the path defined for SAS Work, and deleting all the old folders.
In SAS
%put %sysfunc(pathname(work));
will show you where the current WORK library is located. One level up is where all SAS Work folders are created.
On my system, that returns:
C:\Users\dpazzula\AppData\Local\Temp\SAS Temporary Files\_TD9512_GXM2L12-PAZZULA_
That means that I should look in "C:\Users\dpazzula\AppData\Local\Temp\SAS Temporary Files\" to find old folders to delete.

Your work space is full.
Your SAS server uses a dedicated directory where all SAS sessions store their temporary files: All files in the work libraries, as well as temp files as used while sorting, joining etc.
Solutions:
Have more space allocated.
Make certain only to put necessary files into work/ clean up/ close old sessions.
Run less processes.

Replace interim datasets with views instead, especially if you're using large source datasets :
data master /view=master ;
set lib.monthlydata20: ; /* all datasets since Jan 2000 */
run ;
proc sql ;
create table want as
select *
from master
where ID in(select ID from lookup) ;
quit ;

try to compress all datasets using this option
OPTIONS COMPRESS=YES REUSE=YES;
this should be in the very beginning of your code. it will compress all datasets by nearly 98%.It will also make your code run faster. It will consume more CPU but will decrease size.
In some cases, this might not help if the compressed data sets exceed the hard disk space.
Also, change your work directory to the biggest drive that has disk space.

Study your code.
Create a Data Flow Diagram to determine WHEN each file is created, where it is used downstream. Find out when a data set is no longer needed and DELETE it. If you have 50 data sets, chances are numerous data sets are 'value-added' by a subsequent step, and can go away freeing up your work space. A cute trick is to REUSE some of the data set names - to keep the number of unneeded data sets in check.
Rule of thumb: leave the environment the way you found it - if there were no files in WORK to start, manually clean up after yourself. Unless it is a Stored Process, which starts a completely new SAS job, and will clean up after itself upon completion of the job.

Related

SAS PROC SORT ERROR: No disk space is available for the write operation

I have a database whose size after importing in SAS is around 600mb.
( I use OPTIONS COMPRESS = YES at start of my program)
Then I derive some columns/variables and get a final database of size around 800 mb
Final Database has 1929743 observations
What I want
I want to sort the data in descending order of PUBLICATION_DATE for each record in column ITEM in my final database
My code so far
PROC SORT DATA=newdb.access_db OUT= newdb.access_sorted;
BY ITEM DESCENDING PUBLICATION_DATE;
RUN;
The error which I am getting
ERROR: No disk space is available for the write operation. Filename =
C:\Users\AB364273\AppData\Local\Temp\SAS Temporary
Files\SAS_util00010000204C_A00DVDPCSAS2007\ut204C000008.utl.
ERROR: Failure while attempting to write page 134 of sorted run 11.
ERROR: Failure while attempting to write page 40544 to utility file 1.
ERROR: Failure encountered while creating initial set of sorted runs.
ERROR: Failure encountered during external sort.
ERROR: Sort execution failure.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1244486 observations read from the data set
NEWDB.ACCESS_DB.
WARNING: The data set NEWDB.ACCESS_SORTED may be incomplete. When this step was
stopped there were 0 observations and 57 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 2:17.20
cpu time 14.66 seconds
My database is not so large that error like no disk space should appear.
Also I have a lot of space on my hard disk( around 500 GB on drive where I am storing the database using libname and 8 GB on C Drive)
I have RAM of 4GB
So with all this I am not getting the reason that why this error is appearing and any way I can get the desired output
If you have 8GB free on C drive, then that is likely your problem.
Sorting happens in a temporary (scratch) file, and that file can be up to three times as large as the original file. It also has to be on uncompressed data, for obvious reasons. As such, if your uncompressed file is say 3-4 GB in size, it wouldn't be sortable on the 8GB drive.
You can solve this by either moving your work location to a larger drive (or freeing up space), or by using the TAGSORT option, which reduces the utility file usage at the cost of speed (See SAS documentation for more details).
You also could request it from the database sorted; this is what I'd recommend if you're sorting by fields in the database (not by modified fields). You don't even have to use proc sort in most cases; if the database is in a libname db:
data access_sorted;
set db.access_db_Table;
by item descending publication_date;
run;
That will work just fine and will ask for it in sorted order directly from the database.
My first thought was something #Joe said, that your work library location is lacking space even if overall you have space.
I don't know the answer to this but is an ORDER BY statement in PROC SQL less expensive in terms of the temporary memory required? You could try it at least.

SAS out of memory error

I'm getting a "The remote Process is out of memory" in SAS DIS (Data Integration Studio):
Since it is possible that my approach is wrong, I'll explain the problem I'm working on and the solution I've decided on:
I have a large list of customer names which need cleaning. In order to achieve this, I use a .csv file containing regular expression patterns and their corresponding replacements; (I use this approach since it is easier to add new patterns to the file and upload it to the server for the deployed job to read from rather than harcoding new rules and redeploying the job).
In order to get my data step to make use of the rules in the file I add the patterns and their replacements to an array in the first iteration of my data step then apply them to my names. Something like:
DATA &_OUPUT;
ARRAY rule_nums{1:&NOBS} _temporary_;
IF(_n_ = 1) THEN
DO i=1 to &NOBS;
SET WORK.CLEANING_RULES;
rule_nums{i} = PRXPARSE(CATS('s/',rule_string_match,'/',rule_string_replace,'/i'));
END;
SET WORK.CUST_NAMES;
customer_name_clean = customer_name;
DO i=1 to &NOBS;
customer_name_clean = PRXCHANGE(a_rule_nums{i},1,customer_name_clean);
END;
RUN;
When I run this on around ~10K rows or less, it always completes and finishes extremely quickly. If I try on ~15K rows it chokes for a super long time and eventually throws an "Out of memory" error.
To try and deal with this I built a loop (using the SAS DIS loop transformation) wherein I number the rows of my dataset first, then apply the preceding logic in batches of 10000 names at a time. After a very long time I got the same out of memory error, but when I checked my target table (Teradata) I noticed that it ran and loaded the data for all but the last iteration. When I switched the loop size from 10000 to 1000 I saw exactly the same behaviour.
For testing purposes I've been working with only around ~500K rows but will soon have to handle millions and am worried about how this is going to work. For reference, the set of cleaning rules I'm applying is currently 20 rows but will grow to possibly a few hundred.
Is it significantly less efficient to use a file with rules rather than hard coding the regular expressions directly in my datastep?
Is there any way to achieve this without having to loop?
Since my dataset gets overwritten on every loop iteration, how can there be an out of memory error for datasets that are 1000 rows long (and like 3 columns)?
Ultimately, how do I solve this out of memory error?
Thanks!
The issue turned out to be that the log that the job was generating was too large. The possible solutions are to disable logging or to redirect the log to a location which can be periodically purged and/or has enough space.

Is there any possibility that deleted data can be recovered back in SAS?

I am working on production environment. Last day accidentally I made changes to Master dataset permanently while trying to get the sample out of it in work directory. Unfortunately they don't have any backup for this data.
I wanted to execute this:
Data work.facttable;
Set Master.facttable(obs=10);
run;
instead of this, accidentally I executed the following:
data Master.facttable;
set Master.facttable(obs=10);
run;
You can clearly see what sort of blunder it was!
Facttable has been building up nearly from 2 long years and it is of 250GB and has millions of rows. Now it has 10 rows and is of 128kb :(
I am very much worried how to recover the data back. It is crucial for the business teams. I have no idea how to proceed to get it back.
I know that SAS doesn't support any rollback options or recovery process. We don't use Audit trail method also.
I am just wondering if there is any way that still we can get the data back in spite of all these.
Details: Dataset is assigned on SPDE Engine. I checked the data files(.dpf) but all were disappeared except yesterday's data file which is of 128kb
You appear to have exhausted most of the simple options already:
Restore from external/OS-level backup
Restore from previous generation via the gennum= data set option (only available if the genmax option was set to 1+ when creating the dataset).
Restore from SAS audit trail
I think that leaves you with just 2 options:
Rebuild the dataset from the underlying source(s), if you still have them.
Engage the services of a professional data recovery company, who might be able to recover some or all of the deleted files, depending on the complexity of your storage environment, and how much of the original 250GB has since been overwritten.
Either way, it sounds as though this may prove to have been an expensive mistake.

sort error at end of proc sql for inner join

I ran the following code and an hour later, just as the code was finishing a sort execute error occurred. Is there something wrong with my code or is my computer processor and Ram insufficient
proc sql;
create table today as
select a.account_number, a.client_type, a.device ,a.entry_date_est,
a.entry_time_est, a.duration_seconds, a.channel_name, b.esn, b.service_start_date,
b.service_end_date, b.product_name, b.billing_frequency_fee, b.plan_category,
b.plan_subtype, b.plan_type
from listen_nomiss a inner join service_nomiss b
on (a.account_number = b.account_number)
order by account_number;
quit;
That error is most commonly seen when you run out of utility space to perform the sort. A few suggestions for troubleshooting are available in this SAS KB post; the most useful suggestions:
options fullstimer msglevel=i ; will give you a lot more information about what's going on behind the scenes, so you can troubleshoot what is causing the issue
proc options option=utilloc; run; will tell you where the utility directory is that your temporary files will be created in for the sort. Verify that about 3 times the space needed for the final table is available - sorting requires roughly 3 times the space in order to properly sort the dataset due to how the sort is processed.
OPTIONS COMPRESS; will save some (possibly a lot of) space if not already enabled.
options memsize; and options sortsize; will tell you how much memory is allocated to SAS, and at what size a sort is done in memory versus on disk. sortsize should be about 1/3 of memsize (given the requirement of 3x space to process it). If your final table is around but just over sortsize, you may be better off trying to increase sortsize if the default is too low (same for memsize).
You could also have some issues with permissions; some of the other suggestions in the kb article relate to verifying you actually have permission to write to the utility directory, or that it exists at all.
I've had a project in the past where resources was an issue as well.
A couple of ways around it when sorting were:
Don't forget that proc sort has a TAGSORT option, which will make it first only sort on the by statement variables and attach everything else afterwards. Useful when having many columns not involved in the by statement.
Indexes: if you build an index of exactly the variables in your by-statement, you can use a by statement without sorting, it will rely on the index.
Split it up: you can split up the dataset in multiple chunks and sort each chunk separately. Then you do a data step where you put them all in the set statement. When you use a by statement there as well, SAS will weave the records so that the result is also according to the by-statement.
Note that these approaches have a performance hit (maybe the third one only to a lesser extent) and indexes can give you headaches if you don't take them into account later on (or destroy them intentionally).
One note if/when you would rewrite the whole join as a SAS merge: keep in mind that SAS merge does not by itself mimic many-to-many joins. (it does one-to-one, one-to-many and many-to-one) Probably not the case here (it rarely is), but i mention it to be on the safe side.

How to compare 2 volumes and list modified files?

I have 2 hard-disk volumes(one is a backup image of the other), I want to compare the volumes and list all the modified files, so that the user can select the ones he/she wants to roll-back.
Currently I'm recursing through the new volume and comparing each file's time-stamps to the old volume's files (if they are int the old volume). Obviously this is a blunder approach. It's time consuming and wrong!
Is there an efficient way to do it?
EDIT:
- I'm using FindFirstFile and likes to recurse the volume, and gather info of each file (not very slow, just a few minutes).
- I'm using Volume Shadow Copy to backup.
- The backup-volume is remote so I cannot continuously monitor the actual volume.
Part of this depends upon how the two volumes are duplicated; if they are 'true' copies from the file system's point of view (e.g. shadow copies or other block-level copies), you can do a few tricky little things with respect to USN, which is the general technology others are suggesting you look into. You might want to look at an API like FSCTL_READ_FILE_USN_DATA, for example. That API will let you compare two different copies of a file (again, assuming they are the same file with the same file reference number from block-level backups). If you wanted to be largely stateless, this and similar APIs would help you a lot here. My algorithm would look something like this:
foreach( file in backup_volume ) {
file_still_exists = try_open_by_id( modified_volume )
if (file_still_exists) {
usn_result = compare_usn_values_of_files( file, file_in_modified_volume )
if (usn_result == equal_to) {
// file hasn't changed at all
} else {
// file has changed (somehow)
}
} else {
// file was deleted (possibly deleted and recreated)
}
}
// we still don't know about files new in modified_volume
All of that said, my experience leads me to believe that this will be more complicated than my off-the-cuff explanation hints at. This might be a good starting place, though.
If the volumes are not block-level copies of one another, then it will be very difficult to compare USN numbers and file IDs, if not impossible. Instead, you may very well be going by file name, which will be difficult if not impossible to do without opening every file (times can be modified by apps, sizes and times can be out of date in the findfirst/next queries, and you have to handle deleted-then-recreated cases, rename cases, etc.).
So knowing how much control you have over the environment is pretty important.
Instead of waiting until after changes have happened, and then scanning the whole disk to find the (usually few) files that have changed, I'd set up a program to use ReadDirectoryChangesW to monitor changes as they happen. This will let you build a list of files with a minimum of fuss and bother.
Assuming you're not comparing each file on the new volume to every file in the snapshot, that's the only way you can do it. How are you going to find which files aren't modified without looking at all of them?
I am not a Windows programmer.
However shouldn't u have stat function to retrieve the modified time of a file.
Sort the files based on mod time.
The files having mod time greater than your last backup time are the ones of your interest.
For the first time u can iterate over the back up volume to figure out the max mod time and created time from your interested set.
I am assuming the directories of interest don't get modified in the backup volume.
Without knowing more details about what you're trying to do here, it's hard to say. However, some tips about what I think you're trying to achieve:
If you're only concerned about NTFS volumes, I suggest looking into the USN / change journal API's. They have been around since 2000. This way, after the initial inventory you can only look at changes from that point on. A good starting point for this, though a very old article is here: http://www.microsoft.com/msj/0999/journal/journal.aspx
Also, utilizing USN API's, you could omit the hash step and just record information from the journal yourself (this will become more clear when/if you look into said APIs)
The first time through comparing a drive's contents, utilize a hash such as SHA-1 or MD5.
Store hashes and other such information in a database of some sort. For example, SQLite3. Note that this can take up a huge amount of space itself. A quick look at my audio folder with 40k+ files would result in ~750 megs of MD5 information.