Better cleanup after protocolLogging() on UniVerse? - universe

I have a Basic program that is used to send email via a web service. I want to leverage the protocolLogging() function to catch any issues that sometimes arise when interacting with a web service. However, I don't want to be flooded with all the success logs (right now I'm generating one log file every time the program executes). With the log level set to 3 (or 1 for that matter), an empty file is still generated after a successful run.
Is there a way to prevent the empty log files from being generated in the first place?
If not, is there a better way to clean them up than opening the log directory in the program and deleting the record that was just created (assuming everything else ran successfully)? My log directory is not currently a UniVerse type-19 file or anything.
Is it better to reuse one log file throughout the day? If I change the code to do this, are there any performance implications? This program is a subroutine that gets used heavily at times.
I'm running UniVerse v11.2.3 on Windows Server 2008 R2.

When you use protocolLogging() you are not going to gain any speed advantages logging everything to one file vs multiple files like you have. When you turn the logging on, it still opens a file, and then when you turn the logging off, it will still write and close the file when you have a single log file.
There are 3 ways you can clean up junk files.
1) Create a .BAT file to run at the end of the day in Windows Scheduler
cleanupemptylog.bat
for %F in ("c:\ud\TEST\*.log") do if %~zF equ 0 del "%F"
2) Use a UniVerse Phantom process to do the same thing. You can create a subroutine that do better checks if you want.
3) Run the same command at then end of your current Subroutine:
EXECUTE '!for %F in ("*.log") do if %~zF equ 0 del "%F"'
This will add overhead to your existing subroutine, but it may not be noticeable.
If you want to use a single log based on the current date, then you would could just cleanup the logs on a monthly basis using a phantom or windows scheduler.
ie. delete all log file that are more than 5 days old.
forfiles -p ""c:\ud\TEST\*.log"" -s -m *.* -d -5 -c "cmd /c del #path"
Nathan Rector
International Spectrum
http://www.intl-spectrum.com

Related

File modification time gets overwritten by background cache flushing

I have code that performs following steps:
open file
write data
set file timestamps (via SetFileInformationByHandle(FileBasicInfo))
close file
When file is stored on certain NAS devices (and accessed via share) it's modification time ends up being set to current time.
According to Process Monitor Close() in step 4 results in a Write (local cache gets flushed/pushed to NAS device) that (seemingly) updates file's mtime on server.
If I add FlushFileBuffers() (or sleep for few seconds) between steps 2 and 3 -- everything is fine.
Is this a bug in SMB implementation of this NAS device (Dell EMC Isilon) or SetFileInformationByHandle() never promised anything?
What is the best way to deal with this situation? I would really like to avoid having to call FlushFileBuffers()...
Edit: Great... :-/ It looks like for executables (and only executables) atime (last access time) gets screwed up too (in the same way). Only these are harder to reproduce -- need to run this logic few times. Could be some antivirus... I am still investigating.
Edit 2: According to procmon access time gets updated by EXPLORER.EXE -- when it sees an executable, it can't resist opening it and reading portions of it (probably extracting the icon).
You can't really do anything -- I guess Isilon's SMB implementation doesn't support certain things (that would've preserved timestamps).
I simply added FlushFileBuffers() before SetFileInformationByHandle() and made sure there are no related race conditions in my code.

How to print SAS program log to console when running in batch mode

I am running my SAS program in batch mode through windows command prompt.
Start /WAIT "SAS_job" "C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -sysin D:\MySAS_Test.sas -nosplash -nologo -noicon
Can I display the SAS output or log on command prompt instead of writing to a file? Or print log as SAS program is running to track progress.
I think this is possible in Unix, but I'm not so sure about Windows.
You can write to STDOUT in Unix as mentioned in the documentation. But I don't see any such similar thing in Windows.
The most similar thing is unnamed pipes, which lets you interact with the console - but it's unclear if this is potentially helpful for you or not.
Unfortunately, I suspect SAS is generally not considering Windows a server type environment, and mostly supporting it to allow for desktop use; while it does support Windows servers certainly, most SAS server usage is Linux/Unix.
Your better bet is probably to go in the direction of another program that reads from the log file already produced and writes it out to the console, something analagous to tail in Unix. Or as mentioned in comments open the log in a text editor (you can even 'push' this from SAS if you have option xcmd enabled) and let it auto-refresh periodically.
One common use case in fact is to use, for example, UltraEdit to edit your SAS programs; it can even run them in batch directly, and then retrieve the log in that program.

Jython 2.5.3 and time.sleep

I'm developing a small in house alternative to Tripwire, so I've coded a small script to hash files in a JBoss EAP server, and store the path and the hash in a MySQL database.
Every day the script compares the hashes in the filesystem with those saved in the DB, so any change is logged and finally reported using JasperServer.
The script runs at night using cron, to avoid a large number of scripts quering the DB at the same time it uses time.sleep(RANDOM_NUMBER_OF_SECONDS) before doing the fun stuff, but sometimes time.sleep seems to sleep forever and the script ends without any error, I check the mail cron sends and no error is logged. Any help would be appreciated. I'm Using jython-standalone-2.5.3, IBM's JDK and RHEL 5.6 running inside VMWare.
I just found http://bugs.jython.org/issue1974 and a code comment seems to point that OS signals can cause this behavior, but not sure if this is my case.
If you want to see the code checkout at http://code.google.com/p/pysnapshot/
Luis GarcĂ­a Bustos.
I don't know why do you think time.sleep() can make less number of scripts querying the DB.
IMO ot is better to use cron to call that program periodically. After it is started it should check if in /tmp/ directory is "semaphore" file, for example /tmp/snapshot_working.txt. If there is no semaphore file, then create it and write to it something like: "snapshot started: 2012-12-05 22:00:00". After your program completes checking it should remove this file. If at start program will find semaphore file then it could just stop or check if date & time saved in this file looks "old". If it is "old", then remove it and start normally writing in log that "old" file was found (administrator can find such long working snaphots and terminate it).
The only reason do make time.sleep() in your case is if you want to use such script at normal working hours without making Denial Of Service attack to your DB. Example: after making 100 DB queries you can make little sleep and give DB time to serve other user queries. But I think the sooner program finishes the better.

Ok to write to stdout on Unix process without terminal?

I want to be sure that the following will not compromise my process:
The Solaris program writes heavily to stdout (via C++ wcout stream). The output serves for tracing, so during testing and analyisis the programmer/tester can easily observe what happens. But the program is actually a server process, so in the production version it will run as a demon without attached console and write all the trace output to files.
I assume that stdout is redirected to nul for a program without console, in this case I guess all is fine. However I want to be sure that the stdout output is not buffered somewhere such that after sufficient run-time we could have memory or disk space problems.
Note: we cannot redirect the trace output to a file because this would grow too large. Instead our own file tracing mechanism makes sure that new files are created and old ones deleted to always keep a certain amount of tracing and not more.
That depends how the daemon is started, I guess. When the daemon process is created, the streams have to be taken care of somehow (for example, they need to be detached from the current process, least the daemon would have to be terminated when the shell from which it was started manually exits).
It depends on how the daemon is started. If it's started as a cron job,
the output will be captured and mailed to whoever owns the crontab
entry, unless you redirect the output in the command line. (But
programs started as cron jobs aren't truly daemons.)
More generally, all processes are started from another program (except
the init processes); most of the time, that program is a shell (even
crontab invokes a shell to start its jobs), and the command is given
as a command line. And you can redirect the output anywhere you please
in a command line; /dev/null is a popular choice for cases like yours.
Most daemons are started from an rc file; a shell script installed
under /etc/rcn.d. Just redirect your output there.
Or better yet, rewrite your code to use some form of rotating logs,
instead of standard out.

What is your strategy to write logs in your software to deal with possible HUGE amount of log messages?

Thanks for your time and sorry for this long message!
My work environment
Linux C/C++(but I'm new to Linux platform)
My question in brief
In the software I'm working on we write a LOT of log messages to local files which make the file size grow fast and finally use up all the disk space(ouch!). We want these log messages for trouble-shooting purpose, especially after the software is released to the customer site. I believe it's of course unacceptable to take up all the disk space of the customer's computer, but I have no good idea how to handle this. So I'm wondering if somebody has any good idea here. More info goes below.
What I am NOT asking
1). I'm NOT asking for a recommended C++ log library. We wrote a logger ourselves.
2). I'm NOT asking about what details(such as time stamp, thread ID, function name, etc) should be written in a log message. Some suggestions can be found here.
What I have done in my software
I separate the log messages into 3 categories:
SYSTEM: Only log the important steps in my software. Example: an outer invocation to the interface method of my software. The idea behind is from these messages we could see what is generally happening in the software. There aren't many such messages.
ERROR: Only log the error situations, such as an ID is not found. There usually aren't many such messages.
INFO: Log the detailed steps running inside my software. For example, when an interface method is called, a SYSTEM log message is written as mentioned above, and the entire calling routine into the internal modules within the interface method will be recorded with INFO messages. The idea behind is these messages could help us identify the detailed call stack for trouble-shooting or debugging. This is the source of the use-up-disk-space issue: There are always SO MANY INFO messages when the software is running normally.
My tries and thoughts
1). I tried to not record any INFO log messages. This resolves the disk space issue but I also lose a lot of information for debugging. Think about this: My customer is in a different city and it's expensive to go there often. Besides, they use an intranet that is 100% inaccessible from outside. Therefore: we can't always send engineers on-site as soon as they meet problems; we can't start a remote debug session. Thus log files, I think, are the only way we could make use to figure out the root of the trouble.
2). Maybe I could make the logging strategy configurable at run-time(currently it's before the software runs), that is: At normal run-time, the software only records SYSTEM and ERROR logs; when a problem arises, somebody could change the logging configuration so the INFO messages could be logged. But still: Who could change the configuration at run-time? Maybe we should educate the software admin?
3). Maybe I could always turn the INFO message logging on but pack the log files into a compressed package periodically? Hmm...
Finally...
What is your experience in your projects/work? Any thoughts/ideas/comments are welcome!
EDIT
THANKS for all your effort!!! Here is a summary of the key points from all the replies below(and I'll give them a try):
1). Do not use large log files. Use relatively small ones.
2). Deal with the oldest ones periodically(Either delete them or zip and put them to a larger storage).
3). Implement run-time configurable logging strategy.
There are two important things to take note of:
Extremely large files are unwieldy. They are hard to transmit, hard to investigate, ...
Log files are mostly text, and text is compressible
In my experience, a simple way to deal with this is:
Only write small files: start a new file for a new session or when the current file grows past a preset limit (I have found 50 MB to be quite effective). To help locate the file in which the logs have been written, make the date and time of creation part of the file name.
Compress the logs, either offline (once the file is finished) or online (on the fly).
Put up a cleaning routine in place, delete all files older than X days or whenever you reach more than 10, 20 or 50 files, delete the oldest.
If you wish to keep the System and Error logs longer, you might duplicate them in a specific rotating file that only track them.
Put altogether, this gives the following log folder:
Log/
info.120229.081643.log.gz // <-- older file (to be purged soon)
info.120306.080423.log // <-- complete (50 MB) file started at log in
(to be compressed soon)
info.120306.131743.log // <-- current file
mon.120102.080417.log.gz // <-- older mon file
mon.120229.081643.log.gz // <-- older mon file
mon.120306.080423.log // <-- current mon file (System + Error only)
Depending on whether you can schedule (cron) the cleanup task, you may simply spin up a thread for cleanup within your application. Whether you go with a purge date or a number of files limit is a choice you have to make, either is effective.
Note: from experience, a 50MB ends up weighing around 10MB when compressed on the fly and less than 5MB when compressed offline (on the fly is less efficient).
Your (3) is standard practice in the world of UNIX system logging.
When log file reaches a certain age or maximum size, start a new one
Zip or otherwise compress the old one
throw away the nth oldest compressed log
One way to deal with it is to rotate log files.
Start logging into a new file once you reach certain size and keep last couple of log files before you start overwriting the first one.
You will not have all possible info but you will have at least some stuff leading up to the issue.
The logging strategy sounds unusual but you have your reasons.
I would
a) Make the level of detail in the log messages configurable at run time.
b) Create a new log file for each day. You can then get cron to either compress them and/or delete them or perhaps transfer to off-ling storage.
My answer is to write long logs and then tweat out the info you want.
Compress them on a daily basis - but keep them for a week
I like to log a lot. In some programs I've kept the last n lines in memory and written to disk in case of an error or the user requesting support.
In one program it would keep the last 400 lines in memory and save this to a logging database upon an error. A separate service monitored this database and sent a HTTP request containing summary information to a service at our office which added this to a database there.
We had a program on each of our desktop machines that showed a list (updated by F5) of issues, which we could assign to ourselves and mark as processed. But now I'm getting carried away :)
This worked very well to help us support many users at several customers. If an error occurred on a PDA somewhere running our software then within a minute or so we'd get a new item on our screens. We'd often phone a user before they realised they had a problem.
We had a filtering mechanism to automatically process or assign issues that we knew we'd fixed or didn't care much about.
In other programs I've had hourly or daily files which are deleted after n days either by the program itself or by a dedicated log cleaning service.