SAS code runs with 0 observations if called from %INCLUDE - sas

I'd like to start by saying I'm no SAS wiz by no means.
I inherited SAS code from a team that no longer exists which was written by people who no longer work here, so there is nobody around that would be more familiar with how things work.
The structure of things is:
We have a SAS program that works as a scheduler for triggering a selection of smaller programs in a daily basis. The way it works is using statements to check for the time of day and based on that it then triggers programs that are stored in the server by using an %include statement.
This has worked flawlessly for the past 2 years and suddenly from yesterday on all the codes that are triggered by this scheduler are running with 0 observations.
If I manually open a program in the server (the same program that the scheduler triggers) it runs fine. If the scheduler triggers it then the log shows me that the data set has 0 observations and then stops the step.
This happens for every step in a program since the first one, which can be as simple as the step outlined below:
data drawdown;
set server01.legacy_mapping_drawdown;
run;
If I run the above step manually, log shows:
NOTE: The data set WORK.drawdown has 13643 observations and 107 variables.
If this is triggered by the %include statement, then the log reads:
NOTE: The data set WORK.drawdown has 0 observations and 107 variables.
WARNING: Data set WORK.drawdown was not replaced because this step was stopped.
I have no clue whatsoever as to why this would be happening.
The fact that this started happening on the 02/02/2020 leads me to believe that the new year might have something to do with it.
The code in the scheduler hasn't been touched at all in a while and the several codes are being triggered. It's how they perform that changes depending on being triggered manually or via the scheduler.
I know there is little to no technical details here but there isn't much to it really.
Would appreciate any ideas on this.
Thanks.

Related

BigQueryIO - only first day table can be created, despite having CreateDisposition.CREATE_IF_NEEDED

I have a dataflow job processing data from pub/sub defined like this:
read from pub/sub -> process (my function) -> group into day windows -> write to BQ
I'm using Write.Method.FILE_LOADS because of bounded input.
My job works fine, processing lots of GBs of data but it fails and tries to retry forever when it gets to create another table. The job is meant to run continuously and create day tables on its own, it does fine on the first few ones but then gives me indefinitely:
Processing stuck in step write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at least 05h30m00s without outputting or completing in state finish
Before this happens it also throws:
Load job <job_id> failed, will retry: {"errorResult":{"message":"Not found: Table <name_of_table> was not found in location US","reason":"notFound"}
It is indeed a right error because this table doesn't exists. Problem is that the job should create it on its own because of defined option CreateDisposition.CREATE_IF_NEEDED.
The number of day tables that it creates correctly without a problem depens on number of workers. It seems that when some worker creates one table its CreateDisposition changes to CREATE_NEVER causing the problem, but it's only my guess.
The similar problem was reported here but without any definite answer:
https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609
ProcessElement definition here seems to give some clues but I cannot really say how it works with multiple workers: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138
I use 2.15.0 Apache SDK.
I encountered the same issue, which is still not fixed in BEAM 2.27.0 of january 2021. Therefore I had to develop a workaround: a custom PTransform which checks if the target table exist before the the BigQueryIO stage. It uses the bigquery java client for this and a Guava cache, as well as a windowing strategy (fixed, check every 15s) to sustain a heavy traffic of about 5000 elements per second. Here is the code: https://gist.github.com/matthieucham/85459eff5fdea8d115be520e2dd5ccc1
There was a bug in the past that caused this error, but that particular one was fixed in commit https://github.com/apache/beam/commit/d6b4dcec5f297f5c1bd08f345f0e1e5c756775c2#diff-3f40fd931c8b8b972772724369cea310 Can you check to see if the version of Beam you are running includes this commit?

Run parallel program from SAS EG using %macro

I have some problem to find out if what I want to do is possible or not on SAS EG 7.15.
Context
We have splitted a quite big project into several flow process which have each one a program to define a %macro statement and a final process flow which contain a program call "EXECUTION" to call all those program in the row in this way :
Flow_Process_1/program:
%macro TheFirstOne;
some code
%mend TheFirstOne;
some other process flow
Flow_Process_123/EXECUTION:
%TheFirstOne;
%TheSecondOne;
%TheThirdOne;...
The advantage is that all is splitted and easy to execute just by a click on the first process flow -> SHIFT+Click on the last one and everything is going to be executed. Also you have a unique log file of all the different program in one. And everything works perfectly.
The Problem
The problem is that we have a quite long execution (in time) that we wanted to split into parallel process which is not really hard to set with the "Allow Parallel execution on the same server" option and this feature also work like a charm when you try to execute some program which work on their own.
However, with the way we are working, the first process (thread i would say) knows about our %macro definition and works perfectly, but the second or any other one don't know about them and give error like :
WARNING: Apparent invocation of macro xxxxxxx not resolved.
since we never execute the program before which define this macro before.
My attempt
I tried to use the Autoexec feature but I have to copy the definition of every macro (5000 + lines) inside if that it works for every parallel process. I also looked into SAS code to run a flow process in a program but it looks like it does not exist, or I did not find it.
I'm quite sure that senior in SAS will say that is not the way we should use SAS EG, but we did that way and I just would like to know if there is a solution to let know to other process the definition of all %macro define previously.

Running SAS script using %include

I'm newbie in SAS, coming from SQL, so I'm dealing with them differences.
I have a SAS program "Master.sas" that runs, among other things, something like this:
%include "c:\script1.sas";
%include "c:\script2.sas";
%include "c:\script3.sas";
The question is, if I select all of them and run it, does it run sequentially or in parallel?
For example, if script2 uses a table that is loaded in script1, will it fail to run succesfully?
Well, that example maybe sound obvious as I tested, but what happens if script1 calculate a variable, script2 will have the variable calculated or uses what it found at run time (because, for example, script2 has runned previously than script1)?
Just to clarify, I need that SAS run them sequentially, one after other.
In SQL exists "GO" to separate batch processing, i.e.:
CREATE TABLE XXXXX
GO
SELECT * FROM XXXXX
GO
If someone tries to run that script with out GO, SQL runs them in parallel producing an error on the second script telling that "table XXXXX doesn't exist".
Do I need something similar in SAS or SAS just process next when the previous has finished?
Thanks in advance!
%include will run things in sequence. SAS will run the first %include as if it were just lines in the code, then hit the second and do the same, etc.
SAS's equivalent of GO is RUN, by the by, though in most cases RUN doesn't actually have to be included (though it's considered a good practice). SAS will not run in parallel mode just because you leave out RUN, but it is what tells SAS to go ahead and run the code that was given it. This does not apply in PROC SQL, however; that does not support run-group processing, and instantly executes each statement terminated by ;.
There are ways to make it run in parallel; for example, this hands-on workshop from SUGI 29 on Parallel Processing shows how to use RSUBMIT to do so. Enterprise Guide allows for parallel processing of programs (but not %includes in one program) if you tell it to (but not by default).
%include will run things in sequence. If your code in 1st %include hits an error then your program will stop and won't process other lines.
%include will always run things in sequence.
If there is some variable being created in script1, then you can use the same in script2 but if script 1 is dependent of some variable being created in script 2, it will get error out.

SAS code behaves differently in interactive and batch modes

I have the following code that is running inside a macro. When it is run in interactive mode, it runs absolutely fine, no errors or warning. That was the case for last two year.
The same code has now been deployed in batch mode and it generates a warning WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved. and no value assigned to macro variable.
My question is, does anyone have any ideas why batch mode and interactive mode would behave differently?
Here some more information:
The dataset is being created and it is in work library.
The dataset does get opened by data step.
`firstreccount' doesn't get initialiased anywhere else in the program
I have search sas community. There is a topic here, but I don't have the same errors in batch initilisation as described in the answer.
Detailed information on the warning but it doesn't explain by it would work in interactive mode, but not in batch mode.
.
1735 %LET FIRSTSET = work.dataset1;
1744 DATA _NULL_;
1745 IF 0 THEN
1746 SET &FIRSTSET NOBS=X;
1747 CALL SYMPUT('FIRSTRECCOUNT' ,X);
1748 STOP;
1749 RUN;
1755 DATA _NULL_;
1756 IF 0 THEN
1757 SET &SECONDSET NOBS=X;
1758 CALL SYMPUT('SECONDRECOUNT' ,X);
1759 STOP;
1760 RUN;
WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved.
Update:
So I have attempted to replicate the error by copying the code with warning into a separate scheduled flow, but it didn't cause any errors at all.
By the way, the original job was deployed from SAS DI studio. I have checked all lines in user written code nodes and made sure that the length was within 80 characters as recommended by #RawFocus, #RobertPentridge, but it didn't solve the issue.
As recomended by #data_null_ I have checked VALIDVARNAME and it was different between interactive (value of "any") and batch mode (value of "V7") but changing these hasn't made any difference.
I have rewritted the logic to get the number of observations by calling attr for an open dataset. This eliminated the warning, but program would still fail with warning popping out in different places. It made me think Robert Partridge is correct. At the same time, I got an error that a macro not being resolved. The macro was inserted by DI studio to collect performance MI even that the job wasn't meant to be collecting MI. This made me think that SAS DI studio is not generating code correctly when deploying it, so I manually edited the deployed code to remove offending macro call and I also spotted that there was one line of code with MD5 function that was too long on one line because of a number of parameters being passed to it, so I inserted some white space. And finally the problem was fixed!!
I still need to do something about the job because when it will get redeployed from SAS DI, it will generate the same errors again. I don't have time to look into this further at the moment.
Conclusion: what you write in SAS DI and what gets deployed could be slightly different which could cause syntax parse to throw errors in random places. So I will mark Robert's answer as correct because it got me closer to solving the problem then any other answer.
The problem could be happening above the code snippet you pasted. The parser got into a funk earlier, and ended up issuing warning about code that is perfectly fine.
Check to make sure that no code within a macro is longer that ~160 chars on a single line. I try to keep my code well below that but long lines of code can run fine interactively and fail in batch - particularly when inside of a macro.
I expect your program has some small error above that does not cause SAS to go into syntax check mode when run interactively but does cause SAS to set obs to 0 and enter syntax check mode when run in batch.
One possibility is the limit (in batch mode) of the length of a line in your submitted SAS program:
See: http://support.sas.com/kb/15/883.html
Which version of SAS are you running?

Is there any possibility that deleted data can be recovered back in SAS?

I am working on production environment. Last day accidentally I made changes to Master dataset permanently while trying to get the sample out of it in work directory. Unfortunately they don't have any backup for this data.
I wanted to execute this:
Data work.facttable;
Set Master.facttable(obs=10);
run;
instead of this, accidentally I executed the following:
data Master.facttable;
set Master.facttable(obs=10);
run;
You can clearly see what sort of blunder it was!
Facttable has been building up nearly from 2 long years and it is of 250GB and has millions of rows. Now it has 10 rows and is of 128kb :(
I am very much worried how to recover the data back. It is crucial for the business teams. I have no idea how to proceed to get it back.
I know that SAS doesn't support any rollback options or recovery process. We don't use Audit trail method also.
I am just wondering if there is any way that still we can get the data back in spite of all these.
Details: Dataset is assigned on SPDE Engine. I checked the data files(.dpf) but all were disappeared except yesterday's data file which is of 128kb
You appear to have exhausted most of the simple options already:
Restore from external/OS-level backup
Restore from previous generation via the gennum= data set option (only available if the genmax option was set to 1+ when creating the dataset).
Restore from SAS audit trail
I think that leaves you with just 2 options:
Rebuild the dataset from the underlying source(s), if you still have them.
Engage the services of a professional data recovery company, who might be able to recover some or all of the deleted files, depending on the complexity of your storage environment, and how much of the original 250GB has since been overwritten.
Either way, it sounds as though this may prove to have been an expensive mistake.