Custom Apache log file based on regular expression - regex

I'm working on a project, which require a special Centos Apache server setup. All entries must be pre-processed and sorted by certain conditions and then placed into separate log files.
Take these Apache requests, for instance:
http://logs.domain.com/log.gif?obj=test01|id=886655774|e=via|r=4524925241
http://logs.domain.com/log.gif?obj=test01|id=886655774|e=via|r=9746354562
For every entry, Apache must search through the request URL. If it finds the "obj"-parameter, a log with this name must be created. In this case, one file with the name "test01" is created (if it doesn't already exist), containing these two entries.
I found this among others: http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html and assume, it could be done with some kind of RegEx, but I still need a further push in the right direction, please.

Apache does not support this sort of fine-grain log directing. However, you could certainly write a log reader which reads from the Apache logs and subsequently writes the appropriate sub-logs.
There are three things to be aware of:
Apache log writes are not atomic and Apache uses no file locking. That means that the program reading the log cannot be guarunteed that the last log entry is complete, and that there is no reliable real-time way to tell if Apache is currently writing to the log.
The error logs can have multiple lines, and only the first would match your pattern. Additionally, many CGI scripts will write lines without the appropriate prefix. This is very difficult to control unless you write all the code yourself. The access log, at least, is pretty standard 1 line = 1 entry with prefix though.
The reader must keep track of the current file atime, ctime, and size somewhere on disk, so that if it's stopped and restarted, it can continue where it left off. Additionally, it must correctly handle the switch-over when Apache begins writing a new log. (The old log is renamed.)

Related

procmail: getting procmail to exclude hostname while saving Maildir format messages

How do I get procmail to save messages in my Maildir folder, but not include the hostname in the file (message name)? I get the following message names in my new/ sub-folder:
1464003587.H805375P95754.gator3018.hostgator.com, S=20238_2
I just want to eliminate the hostname. Is that possible to do, using procmail? How? Separately, it is possible to replace the first time stamp with the time-sent time-stamp? Is it possible to prescribe a format for procmail?
No, you can't override Maildir's filename format, not least because it's prescribed to be in a particular way for interoperability reasons. The format is guaranteed to be robust against clashes when multiple agents on multiple hosts concurrently write to the same message store. This can only work correctly if they all play by the same rules. An obvious part of those rules is the one which dictates that the host name where the agent is running must be included in the filename of each new message.
The Wikipedia Maildir article has a good overview of the format's design and history, and of course links to authoritative standards and other primary sources.
If you don't particularly require Maildir compatibility (with the tmp / new / cur subdirectories etc) you can simply create a unique mbox file on each run; if you can guarantee that it is unique, you don't need locking when you write to it.
For example, if you have a tool called uuid which generates a guaranteed unique identifier on each invocation, you can use that as the file name easily;
:0 # or maybe :0r
`uuid`
It should be easy to see how to supply your own tool instead, if you really think you can create your own solution for concurrent delivery. (Maildir solves concurrent and distributed delivery, so the requirements for that are stricter.)
The other formats supported by Procmail have their own hardcoded rules for how file names are generated, though perhaps the simple MH folder format, with a (basically serially incrementing) message number as the file name, would be worth investigating as well. The old mini-FAQ has a brief overview of the supported formats and how to select which one Procmail uses for delivery in each individual recipe.

web.config vs. text file for storing a comma-separated value

We have a collection of VB.NET / IIS web services on some of our servers, and they have web.config files in the websites' root directories that they're already reading configurations from. There is a new configuration that needed to be added that will immediately be quite a bit longer than the others, and it'll only stand to grow. It's essentially a comma-separated value, and I'm wanting to keep it specifically in a configuration file of some sort.
At first I started doing this with a text file, but there was a problem with that. The text file's contents could change while web service threads and processes are running, so they would need to essentially re-read the file every time they needed to access its values. I thought about using some sort of caching, but unless the web services are completely restarted each time the file is updated, caching would block updates to the file from being used immediately. But reading from a text file each time is slow...
Then came the idea of putting that value in web.config, along with the other configurations the services are already using. When web.config is altered, the changes are able to be cached in the code, on top of coming into play immediately. However web.config is, well, web.config, and it's not a totally trivialized text file that is simply read out of in the code. IIS treats web.config in a special manner.
I'm tempted to think any negative consequences of putting a comma-separated value in web.config would be outweighed, in comparison to storing them in a text file (or a database, which probably can't be used for this anyway), but I guess I better ask.
What are the implications of storing a possibly lengthy, comma-separated value in web.config, instead of in its own little text file? Is either file a particularly good or bad idea? To me, it seems like web.config would be easy to get along with without having to re-read the file over and over, but there's certainly more to it than the common user is aware. Thanks!
I recommend using the Application Cache for this:
http://msdn.microsoft.com/en-us/library/vstudio/6hbbsfk6(v=vs.100).aspx

Boost.Log - Multiple processes to one log file?

Reading through the doc for Boost.Log, it explains how to "fan out" into multiple files/sinks pretty well from one application, and how to get multiple threads working together to log to one place, but is there any documentation on how to get multiple processes logging to a single log file?
What I imagine is that every process would log to its own "private" log file, but in addition, any messages above a certain severity would also go to a "common" log file. Is this possible with Boost.Log? Is there some configuration of the sinks that makes this easy?
I understand that I will likely have the same "timestamp out of order" problem described in the FAQ here, but that's OK, as long as the timestamps are correct I can work with that. This is all on one machine, so no remote filesystem problems either.
My expectation is that the Boost.Log backends that directly write logfiles will keep those files open in between writing the log entries.
This will cause problems with using the same logfile from multiple processes, because the filesystem usually won't allow more than one process to write to a file.
There are a few Boost.Log backends that can be used to have all the logging end up in one place.
These are the syslog and Windows eventlog backends. Of these, the syslog backend is probably the easiest to use.

Process queue as folder with files. Possible problems?

I have an executable that needs to process records in the database when the command arrives to do so. Right now, I am issuing commands via TCP exchange but I don't really like it cause
a) queue is not persistent between sessions
b) TCP port might get locked
The idea I have is to create a folder and place files in it whose names match the commands I want to issue
Like:
1.23045.-1.1
2.999.-1.1
Then, after the command has been processed, the file will be deleted or moved to Errors folder.
Is this viable or are there some unavoidable problems with this approach?
P.S. The process will be used on Linux system, so Antivirus problems are out of the question.
Yes, a few.
First, there are all the problems associated with using a filesystem. Antivirus programs are one (though I cannot see why it doesn't apply to Linux - no delete locks?). Disk space, file and directory count maximums are others. Then, open file limits and permissions...
Second, race conditions. If there are multiple consumers, more than one of them might see and start processing the command before the first one has [re]moved it.
There are also the issues of converting commands to filenames and vice versa, and coming up with different names for a single command that needs to be issued multiple times. (Though these are programming issues, not design ones; they'll merely annoy.)
None of these may apply or be of great concern to you, in which case I say: Go ahead and do it. See what we've missed that Real Life will come up with.
I probably would use an MQ server for anything approaching "serious", though.

Logging Etiquette

I have a server program that I am writing. In this program, I log allot. Is it customary in logging (for a server) to overwrite the log of previous runs, append to the file with some sort of new run header, or to create a new log file (it won't be restarted too often).
Which of these solutions is the way of doing things under Linux/Unix/MacOS?
Also, can anyone suggest a logging library for C++/C? I need one, regardless of the answer to the above question.
Take a look in /var/log/...you'll see that files are structured like
serverlog
serverlog.1
serverlog.2
This is done by logrotate which is called in a cronjob. But everything is simply in chronological order within the files. So you should just append to the same log file each time, and let logrotate split it up if needed.
You can also add a configuration file to /etc/logrotate.d/ to control how a particular log is rotated. Depending on how big your logfiles are, it might be a good idea to add here information about your logging. You can take a look at other files in this directory to see the syntax.
This is a rather complex issue. I don't think that there is a silver bullet that will kill all your concerns in one go.
The first step in deciding what policy to follow would be to set your requirements. Why is each entry logged? What is its purpose? In most cases this will result in some rather concrete facts, such as:
You need to be able to compare the current log with past logs. Even when an error message is self-evident, the process that led to it can be determined much faster by playing spot-the-difference, rather than puzzling through the server execution flow diagram - or, worse, its source code. This means that you need at least one log from a past run - overwriting blindly is a definite No.
You need to be able to find and parse the logs without going out of your way. That means using whatever facilities and policies are already established. On Linux it would mean using the syslog facility for important messages, to allow them to appear in the usual places.
There is also some good advice to heed:
Time is important. No only because there's never enough of it, but also because log files without proper timestamps for each entry are practically useless. Make sure that each entry has a timestamp - most system-wide logging facilities will do that for you. Make also sure that the clocks on all your computers are as accurate as possible - using NTP is a good way to do that.
Log entries should be as self-contained as possible, with minimal cruft. You don't need to have a special header with colors, bells and whistles to announce that your server is starting - a simple MyServer (PID=XXX) starting at port YYYYY would be enough for grep (or the search function of any decent log viewer) to find.
You need to determine the granularity of each logging channel. Sending several GB of debugging log data to the system logging daemon is not a good idea. A good approach might be to use separate log files for each logging level and facility, so that e.g. user activity is not mixed up with low-level data that in only useful when debugging the code.
Make sure your log files are in one place, preferably separated from other applications. A directory with the name of your application is a good start.
Stay within the norm. Sure you may have devised a new nifty logfile naming scheme, but if it breaks the conventions in your system it could easily confuse even the most experienced operators. Most people will have to look through your more detailed logs in a critical situation - don't make it harder for them.
Use the system log handling facilities. E.g. on Linux that would mean appending to the same file and letting an external daemon like logrotate to handle the log files. Not only would it be less work for you, it would also automatically maintain any general logging policies as a whole.
Finally: Always copy log important data to the system log as well. Operators watch the system logs. Please, please, please don't make them have to look at other places, just to find out that your application is about to launch the ICBMs...
https://stackoverflow.com/questions/696321/best-logging-framework-for-native-c
For the logging, I would suggest creating a new log file and clean it using a certain frequency to avoid it growing too fat. Overwrite logs of previous login is usually a bad idea.