Auto Scheduled SAS Queries

Auto Scheduled SAS Queries - sas

I run some SAS queries monthly and they all take a fairly long time to run. I was wondering if there was any way I can schedule these to run on a certain date each month in a certain order?
Thanks for your help!

On unix you could set the programs up to run in batch mode with a cron job.
One trick you could use would be to set up a master SAS program to run everything.
Make one program that just contains any global variables that need to be changed each month and then call your monthly programs with includes.
something like:
%let globalvar1 = ThisMonth;
%let globalvar2 = LastMonth;
%include '/path/to/sas/program1';
%include '/path/to/sas/program2';
Then you run only this one program in batch...it will run them in the correct order and automatically wait for them to finish executing before moving to the next program (setting up separate cron jobs would require you to overestimate how long each one takes so they wouldn't conflict).
This will dump everything into one log file...which may be good or bad.
Another option would be to use X to call the program from the OS at each run.
I am not 100% on the syntax but this should work if you use the right syntax for your OS (this could work on unix or windows so you would only have to schedule one program).
At the end of each program just add:
X "Path/to/sas.exe" -batch -noterminal nextProgram.sas
This will let you chain the programs together so that they start the next one running after they finish. Then you just use task scheduler/cron to start "sas.exe -batch -noterminal firstProgram.sas"

Depending on what system you are working with, methods may be different.
The main idea is that you may store all queries to a SAS processing file then use system's scheduler (For example, task scheduler for Windows), to run monthly.
A quick help (for Windows):
http://analytics.ncsu.edu/sesug/2006/CC04_06.PDF

Related

How to implement google test sharding in c++?

I want to parallelise my googletest cases in c++.
I have read the documentation of google test sharding but unable to implement it in c++ coding environment.
As I'm new to the coding field , so can anyone please by a code explain to me the documentation in the link below
https://github.com/google/googletest/blob/master/googletest/docs/advanced.md
Google Sharding works on different machines or can be implemented on same using multiple threads?

Sharding isn't done in code, it's done using the environment. Your machine specifies two environment variables GTEST_TOTAL_SHARDS, which is the total number of machines you are running and GTEST_SHARD_INDEX, which is unique to each machine. When GTEST starts up, it selects a subset of these tests.
If you want to simulate this, then you need to set these environment variables (which can be done in code).
I would probably try something like this (on Windows) in a .bat file:
set GTEST_TOTAL_SHARDS=10
FOR /L %%I in (1,1,10) DO cmd.exe /c "set GTEST_SHARD_INDEX=%%I && start mytest.exe"
And hope that the new cmd instance had it's own environment.

Running the following in a command window worked for me (very similar to James Poag's answer, but note change of range from "1,1,10" to "0,1,9", "%%" -> "%" and "set" to "set /A"):
set GTEST_TOTAL_SHARDS=10
FOR /L %I in (0,1,9) DO cmd.exe /c "set /A GTEST_SHARD_INDEX=%I && start mytests.exe"

After further experimentation it is also possible to do this in C++. But it is not straightforward and I did not find a portable way of doing it. I can't post the code as it was done at work.
Essentially, from main, create new processes (where n is the number of cores available), capture the results from each shard, merge and output to the screen.
To get each process running a different shard, the total number of shard and instance number is given to the child process by the controller.
This is done by retrieving and copying the current environment, and setting in the copy the two environment variables (GTEST_TOTAL_SHARDS and GTEST_SHARD_INDEX) as required. GTEST_TOTAL_SHARDS is always the same, but GTEST_SHARD_INDEX will be the instance number of the child.
Merging the results is tedious but straightforward string manipulation. I successfully managed to get a correct total at the end, adding up the results of all the separate shards.
I was using Windows, so used CreateProcessA to create the new processes, passing in the custom environment.
It turned out that creating new processes takes a significant amount of time, but my program was taking about 3 minutes to run, so there was good benefits to be had from parallel running - the time came down to about 30 seconds on my 12 core PC.
Note that if this all seems overkill, there is a python program which does what I have described here but using a python script (I think - I haven't used it). This might be more straight forward.

Terminate whole process flow instead of single program

Is there a way to stop the process if certain criteria in any of the programs in this process is met?
I have a process consist of 5 SAS programs. This process is scheduled to run at 8am every morning. However, sometimes the database is not refreshed and this process will send out weird figures.
I need to have "exception control".. In 2nd program I check the database with some criteria. If no error, then keep running the rest of the code. Otherwise, send out an notification email and STOP running the 2nd program and all the subsequent programs.
I try %abort cancel but it only terminate the current program. The subsequent programs are not affected.. I can do checking in every single program but that make the code redundant...
I also try google "terminate SAS process" but most of them refer to abort statements which doesn't help...

If you're using Enterprise Guide, this is built into the program via logic gates.
First, in the program that determines whether the database file passed/failed ("gate program"), assign a macro variable a value based on that test. Presumably this program will do only things you're happy for it to do even if it fails.
On the process flow page, you right click on the program that determines if the database file passed/failed, and select 'Condition -> Add'.
Then add a condition based on a Macro Variable, and use 'equals' and the value you're looking for (or 'greater than' or whatever makes sense). Then select the next task after "Then run this task"; and put the other option after Else run this task.
Then, whichever of the two is forward-moving, should then have links to the rest of the programs you want run; the one that's not should end the process.
SAS gives an example of how to do that in KB Sample 39995 including a sample project you can download.
Second, you can set OBS=0 if you reach the error condition. This will let SAS continue working, but it in most cases won't be able to do anything (since OBS=0, then it can only affect 0 records of any dataset). I'm not sure that's a guarantee that it won't do anything, but in everything I've done that's been sufficient. I also have used OPTIONS ERRORABEND which works fine if you do all of your processing with external libnames which won't automatically reconnect when SAS is reconnected.

My understanding is that this is a batch process. You don't specify Operating System you are running your process on. Let's suppose you are running it on UNIX/Linux (I am hoping it is similar on Windows). Let's assume that your 5-programm process is run by the following shell script:
sas /program1.sas
sas /program2.sas
sas /program3.sas
sas /program4.sas
sas /program5.sas
If you want to stop your remaining process after program2.sas completes with ERRORs or WARNINGs you can modify your script to be
sas /program1.sas
sas /program2.sas
if [ $? -ne 0 ]
then
exit
fi
sas /program3.sas
sas /program4.sas
sas /program5.sas
In this script code a special shell script variable $? status code is passed from the previous command from SAS (0 means successful completion). If it is not 0 then the whole script stops due to the exit command.
For more information and code examples see How to conditionally terminate a SAS batch flow process in UNIX/Linux SAS blog post.

How to call one macro program from another in SAS Enterprise Guide?

Is there any macro command that allows calling one program from another (the %run_program() pseudo code)?
Program "Settings":
%let myvar="HELLO WORLD!";
Program "Program":
%run_program(Settings); *Pseudo-code;
%put &myvar; *Should print *Should print "HELLO WORLD!";

This is not exact answer on your question, but if you only want to be sure that Settings is run before Program when Run Process Flow you can link them together.
Right click Settings,
choose Link Settings to...,
and pick Program from the dialog.
Run Process FLow And see the Hello World be printed in the log.

I think that you are looking for the %include function.
You would have to save 'Settings' as a stand-alone program on your server like '/myserver/somefolder/settings.sas'.
Then you could ensure that it is run via:
...some code
%include ('/myserver/somefolder/settings.sas');
... more code
The program would run exactly as it would if you copy-pasted the contents of 'settings.sas' into the current program.

In addition to the Process Flow, you also can create an Ordered List. This allows you to run programs in a single process flow in multiple different orders (or run a subset of a process flow).
You create that in New -> Ordered List, then Add programs to it, move them up/down in the order you want. Then you see the ordered list on the left in the project tree and can right click to run it (or select then F8).
There is not a macro command to run a program in enterprise guide; you can use automation via .NET if you want to do things like that. Chris Hemedinger on The SAS Dummy has a good article on EG automation.

Jython 2.5.3 and time.sleep

I'm developing a small in house alternative to Tripwire, so I've coded a small script to hash files in a JBoss EAP server, and store the path and the hash in a MySQL database.
Every day the script compares the hashes in the filesystem with those saved in the DB, so any change is logged and finally reported using JasperServer.
The script runs at night using cron, to avoid a large number of scripts quering the DB at the same time it uses time.sleep(RANDOM_NUMBER_OF_SECONDS) before doing the fun stuff, but sometimes time.sleep seems to sleep forever and the script ends without any error, I check the mail cron sends and no error is logged. Any help would be appreciated. I'm Using jython-standalone-2.5.3, IBM's JDK and RHEL 5.6 running inside VMWare.
I just found http://bugs.jython.org/issue1974 and a code comment seems to point that OS signals can cause this behavior, but not sure if this is my case.
If you want to see the code checkout at http://code.google.com/p/pysnapshot/
Luis García Bustos.

I don't know why do you think time.sleep() can make less number of scripts querying the DB.
IMO ot is better to use cron to call that program periodically. After it is started it should check if in /tmp/ directory is "semaphore" file, for example /tmp/snapshot_working.txt. If there is no semaphore file, then create it and write to it something like: "snapshot started: 2012-12-05 22:00:00". After your program completes checking it should remove this file. If at start program will find semaphore file then it could just stop or check if date & time saved in this file looks "old". If it is "old", then remove it and start normally writing in log that "old" file was found (administrator can find such long working snaphots and terminate it).
The only reason do make time.sleep() in your case is if you want to use such script at normal working hours without making Denial Of Service attack to your DB. Example: after making 100 DB queries you can make little sleep and give DB time to serve other user queries. But I think the sooner program finishes the better.

Run Linux commands from Daemon

I need to run a linux command such as "df" from my linux daemon to know free space,used space, total size of the parition and other info. I have options like calling system,exec,popen etc..
But as this each command spawn a new process , is this not possible to run the commands in the same process from which it is invoked?
And at the same time as I need to run this command from a linux daemon, as my daemon should not hold any terminal. Will it effect my daemon behavior?
Or is their any C or C++ standard API for getting the mounted paritions information

There is no standard API, as this is an OS-specific concept.
However,
You can parse /proc/mounts (or /etc/mtab) with (non-portable) getmntent/getmntent_r helper functions.
Using information about mounted filesystems, you can get its statistics with statfs.

You may find it useful to explore the i3status program source code: http://code.stapelberg.de/git/i3status/tree/src/print_disk_info.c
To answer your other questions:
But as this each command spawn a new process , is this not possible to run the commands in the same process from which it is invoked?
No; entire 'commands' are self-contained programs that must run in their own process.
Depending upon how often you wish to execute your programs, fork();exec() is not so bad. There's no hard limits beyond which it would be better to gather data yourself vs executing a helper program. Once a minute, you're probably fine executing the commands. Once a second, you're probably better off gathering the data yourself. I'm not sure where the dividing line is.
And at the same time as I need to run this command from a linux daemon, as my daemon should not hold any terminal. Will it effect my daemon behavior?
If the command calls setsid(2), then open(2) on a terminal without including O_NOCTTY, that terminal might become the controlling terminal for that process. But that wouldn't influence your program, because your program already disowned the terminal when becoming a daemon, and as the child process is a session leader, it cannot change your process's controlling terminal.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Auto Scheduled SAS Queries - sas

I run some SAS queries monthly and they all take a fairly long time to run. I was wondering if there was any way I can schedule these to run on a certain date each month in a certain order? Thanks for your help!

Related

How to implement google test sharding in c++?

Terminate whole process flow instead of single program

How to call one macro program from another in SAS Enterprise Guide?

Jython 2.5.3 and time.sleep

Run Linux commands from Daemon

Categories

Resources