I'm newbie in SAS, coming from SQL, so I'm dealing with them differences.
I have a SAS program "Master.sas" that runs, among other things, something like this:
%include "c:\script1.sas";
%include "c:\script2.sas";
%include "c:\script3.sas";
The question is, if I select all of them and run it, does it run sequentially or in parallel?
For example, if script2 uses a table that is loaded in script1, will it fail to run succesfully?
Well, that example maybe sound obvious as I tested, but what happens if script1 calculate a variable, script2 will have the variable calculated or uses what it found at run time (because, for example, script2 has runned previously than script1)?
Just to clarify, I need that SAS run them sequentially, one after other.
In SQL exists "GO" to separate batch processing, i.e.:
CREATE TABLE XXXXX
GO
SELECT * FROM XXXXX
GO
If someone tries to run that script with out GO, SQL runs them in parallel producing an error on the second script telling that "table XXXXX doesn't exist".
Do I need something similar in SAS or SAS just process next when the previous has finished?
Thanks in advance!
%include will run things in sequence. SAS will run the first %include as if it were just lines in the code, then hit the second and do the same, etc.
SAS's equivalent of GO is RUN, by the by, though in most cases RUN doesn't actually have to be included (though it's considered a good practice). SAS will not run in parallel mode just because you leave out RUN, but it is what tells SAS to go ahead and run the code that was given it. This does not apply in PROC SQL, however; that does not support run-group processing, and instantly executes each statement terminated by ;.
There are ways to make it run in parallel; for example, this hands-on workshop from SUGI 29 on Parallel Processing shows how to use RSUBMIT to do so. Enterprise Guide allows for parallel processing of programs (but not %includes in one program) if you tell it to (but not by default).
%include will run things in sequence. If your code in 1st %include hits an error then your program will stop and won't process other lines.
%include will always run things in sequence.
If there is some variable being created in script1, then you can use the same in script2 but if script 1 is dependent of some variable being created in script 2, it will get error out.
Related
I'd like to start by saying I'm no SAS wiz by no means.
I inherited SAS code from a team that no longer exists which was written by people who no longer work here, so there is nobody around that would be more familiar with how things work.
The structure of things is:
We have a SAS program that works as a scheduler for triggering a selection of smaller programs in a daily basis. The way it works is using statements to check for the time of day and based on that it then triggers programs that are stored in the server by using an %include statement.
This has worked flawlessly for the past 2 years and suddenly from yesterday on all the codes that are triggered by this scheduler are running with 0 observations.
If I manually open a program in the server (the same program that the scheduler triggers) it runs fine. If the scheduler triggers it then the log shows me that the data set has 0 observations and then stops the step.
This happens for every step in a program since the first one, which can be as simple as the step outlined below:
data drawdown;
set server01.legacy_mapping_drawdown;
run;
If I run the above step manually, log shows:
NOTE: The data set WORK.drawdown has 13643 observations and 107 variables.
If this is triggered by the %include statement, then the log reads:
NOTE: The data set WORK.drawdown has 0 observations and 107 variables.
WARNING: Data set WORK.drawdown was not replaced because this step was stopped.
I have no clue whatsoever as to why this would be happening.
The fact that this started happening on the 02/02/2020 leads me to believe that the new year might have something to do with it.
The code in the scheduler hasn't been touched at all in a while and the several codes are being triggered. It's how they perform that changes depending on being triggered manually or via the scheduler.
I know there is little to no technical details here but there isn't much to it really.
Would appreciate any ideas on this.
Thanks.
I have some problem to find out if what I want to do is possible or not on SAS EG 7.15.
Context
We have splitted a quite big project into several flow process which have each one a program to define a %macro statement and a final process flow which contain a program call "EXECUTION" to call all those program in the row in this way :
Flow_Process_1/program:
%macro TheFirstOne;
some code
%mend TheFirstOne;
some other process flow
Flow_Process_123/EXECUTION:
%TheFirstOne;
%TheSecondOne;
%TheThirdOne;...
The advantage is that all is splitted and easy to execute just by a click on the first process flow -> SHIFT+Click on the last one and everything is going to be executed. Also you have a unique log file of all the different program in one. And everything works perfectly.
The Problem
The problem is that we have a quite long execution (in time) that we wanted to split into parallel process which is not really hard to set with the "Allow Parallel execution on the same server" option and this feature also work like a charm when you try to execute some program which work on their own.
However, with the way we are working, the first process (thread i would say) knows about our %macro definition and works perfectly, but the second or any other one don't know about them and give error like :
WARNING: Apparent invocation of macro xxxxxxx not resolved.
since we never execute the program before which define this macro before.
My attempt
I tried to use the Autoexec feature but I have to copy the definition of every macro (5000 + lines) inside if that it works for every parallel process. I also looked into SAS code to run a flow process in a program but it looks like it does not exist, or I did not find it.
I'm quite sure that senior in SAS will say that is not the way we should use SAS EG, but we did that way and I just would like to know if there is a solution to let know to other process the definition of all %macro define previously.
I have the following code that is running inside a macro. When it is run in interactive mode, it runs absolutely fine, no errors or warning. That was the case for last two year.
The same code has now been deployed in batch mode and it generates a warning WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved. and no value assigned to macro variable.
My question is, does anyone have any ideas why batch mode and interactive mode would behave differently?
Here some more information:
The dataset is being created and it is in work library.
The dataset does get opened by data step.
`firstreccount' doesn't get initialiased anywhere else in the program
I have search sas community. There is a topic here, but I don't have the same errors in batch initilisation as described in the answer.
Detailed information on the warning but it doesn't explain by it would work in interactive mode, but not in batch mode.
.
1735 %LET FIRSTSET = work.dataset1;
1744 DATA _NULL_;
1745 IF 0 THEN
1746 SET &FIRSTSET NOBS=X;
1747 CALL SYMPUT('FIRSTRECCOUNT' ,X);
1748 STOP;
1749 RUN;
1755 DATA _NULL_;
1756 IF 0 THEN
1757 SET &SECONDSET NOBS=X;
1758 CALL SYMPUT('SECONDRECOUNT' ,X);
1759 STOP;
1760 RUN;
WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved.
Update:
So I have attempted to replicate the error by copying the code with warning into a separate scheduled flow, but it didn't cause any errors at all.
By the way, the original job was deployed from SAS DI studio. I have checked all lines in user written code nodes and made sure that the length was within 80 characters as recommended by #RawFocus, #RobertPentridge, but it didn't solve the issue.
As recomended by #data_null_ I have checked VALIDVARNAME and it was different between interactive (value of "any") and batch mode (value of "V7") but changing these hasn't made any difference.
I have rewritted the logic to get the number of observations by calling attr for an open dataset. This eliminated the warning, but program would still fail with warning popping out in different places. It made me think Robert Partridge is correct. At the same time, I got an error that a macro not being resolved. The macro was inserted by DI studio to collect performance MI even that the job wasn't meant to be collecting MI. This made me think that SAS DI studio is not generating code correctly when deploying it, so I manually edited the deployed code to remove offending macro call and I also spotted that there was one line of code with MD5 function that was too long on one line because of a number of parameters being passed to it, so I inserted some white space. And finally the problem was fixed!!
I still need to do something about the job because when it will get redeployed from SAS DI, it will generate the same errors again. I don't have time to look into this further at the moment.
Conclusion: what you write in SAS DI and what gets deployed could be slightly different which could cause syntax parse to throw errors in random places. So I will mark Robert's answer as correct because it got me closer to solving the problem then any other answer.
The problem could be happening above the code snippet you pasted. The parser got into a funk earlier, and ended up issuing warning about code that is perfectly fine.
Check to make sure that no code within a macro is longer that ~160 chars on a single line. I try to keep my code well below that but long lines of code can run fine interactively and fail in batch - particularly when inside of a macro.
I expect your program has some small error above that does not cause SAS to go into syntax check mode when run interactively but does cause SAS to set obs to 0 and enter syntax check mode when run in batch.
One possibility is the limit (in batch mode) of the length of a line in your submitted SAS program:
See: http://support.sas.com/kb/15/883.html
Which version of SAS are you running?
I have built something in SAS to pull down Yahoo! finance .csv data. The code I have built now works fine and I have built some robust error handling into the code. The problem I have had with the data though is that the .csv feed is unsupported and not clean.
The data is comma delimited, but some of the data also has commas in it. Some of the fields are in quotes and some are not. Also the length of the fields varies wildly as as well. A field like Market Capitlisation for example could run form a few million to hundreds of billions.
As a result, if you pass multiple stock metrics for multiple stocks through to the Yahoo! API at the same time, you will get rows of .csv data where each field is in a different place, is a different length and is inconsistently delimited.
I have tried multiple infile options that could handle some of these errors in isolation, but not all of them together. My only solution that works is to download single stock metrics by multiple stocks at the same time.
This gives me what I want, but it takes over an hour to run the data for the NASDAQ and the NYSE. Have I overlooked another method for handling this type of problem?
Thanks
This is the outline of a way to do what you are looking for. The whole of the code to do this would be too long to post here and out of scope of what this site looks to do.
Create a SAS program that takes a stock ticker from the SYSPARM automatic macro, and downloads the data to a data set named the same as the ticker into a permanent library.
The SYSPARM macro is set by the value you set on the commandline to call SAS
sas.exe myprog.sas -sysparm XYZ
This would set &SYSPARM to resolve XYZ
Write a SAS program that merges all the ticker data sets together for further processing.
Create a program in a language like Perl or Python, (or shell script, etc.) that loops over a range of tickers and calls your SAS program, passing the ticker through SYSPARM.
Use a threading, forking, etc. package from that language to have multiple of these running at the same time. You can probably go to some multiple of the CPU cores on your machine as this processing will not be CPU intensive. Test values to you find one that works.
From that same language call your SAS program to merge the datasets.
Does anyone know how to use the "execute selection" function in the do-file editor of Stata for code that spans multiple lines?
Currently I can't find a way to do this without using the #delimit ; system which requires repeating "delimit ;" at the beginning of every block I want to run.
Any suggestions appreciated!
I believe that you might be understanding the #delimit ; command wrongly: this is useful when you are coding a do-file to execute it in its entirety afterwards. I also assume that you are using Stata 11, since previous versions behave differently (if I recall well, Stata 10 SE for Mac does not support // comments and delimiting, for example).
If you are executing only a fraction of the code, use /// at the end of a line to continue its command on the next one.
Basic example (that will clear any open data, so beware):
sysuse lifeexp, clear
sc lexp safewater, ///
mlab(country);
This should run flawlessly even if you execute the sysuse command and the sc (scatter) commands separately. The sc command has the mlab option (to add labels to the data points) on a different line, but both lines will be interpreted as only one command due to the /// indication.
Hope this helps!