I have a problem i configured daily logrotation for catalina.out in Centos7 but it is not rotated , if force run logrotate it rotates catalina but not automatically on daily basis.
logrotate.d/tomcat configfile:
/usr/local/tomcat7/logs/catalina.out
{
daily
rotate 30
missingok
compress
copytruncate
}
logrotate.conf:
# see "man logrotate" for details
# rotate log files daily
daily
# keep 4 weeks worth of backlogs
rotate 4
# create new (empty) log files after rotating old ones
create
# use date as a suffix of the rotated file
dateext
# uncomment this if you want your log files compressed
#compress
# RPM packages drop log rotation information into this directory
include /etc/logrotate.d
# no packages own wtmp and btmp -- we'll rotate them here
/var/log/wtmp {
monthly
create 0664 root utmp
minsize 1M
rotate 1
}
/var/log/btmp {
missingok
monthly
create 0600 root utmp
rotate 1
}
# system-specific logs may be also be configured here.
logrotate status/debug:
rotating pattern: /usr/local/tomcat7/logs/catalina.out
after 1 days (30 rotations)
empty log files are rotated, old logs are removed
considering log /usr/local/tomcat7/logs/catalina.out
log does not need rotating (log has been already rotated)
"/usr/local/tomcat7/logs/catalina.out" 2019-8-5-9:25:18
In it's default state tomcat uses the log4j
You should have a file at /etc/tomcat/log4j.properties which contains the configuration for the log management.
The default config is (taken from a test box):
log4j.rootLogger=debug, R
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=${catalina.home}/logs/tomcat.log
log4j.appender.R.MaxFileSize=10MB
log4j.appender.R.MaxBackupIndex=10
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n
log4j.logger.org.apache.catalina=DEBUG, R
log4j.logger.org.apache.catalina.core.ContainerBase.[Catalina].[localhost]=DEBUG, R
log4j.logger.org.apache.catalina.core=DEBUG, R
log4j.logger.org.apache.catalina.session=DEBUG, R
Based on that config the logs will automatically rotate when they grow to 10MB in size, and will keep a maximum of 10 old logs.
You can change these settings as you wish and there are a couple of good guides here and here which explain all the options, and show how to change to a rolling appender which may be more useful for your needs.
Also log4j takes care of the rotation but if you are doing something like tail -f cat.out and the log rotates you will need to re-tail the file to continue watching it else it will just appear to stop mid way (much like the other logs do)
Remember to remove any config you tried to apply via logrotate so things don't go wrong later on!
To have a daily rotation you need to use these settings;
DailyRollingFileAppender
DailyRollingFileAppender rotates log files based on frequency of time
allowing customization upto minute. Date Patterns allowed as part of
the Appender are as follows:
yyyy-MM Roll over to new log file beginning on first day of every month
yyyy-ww Roll over to new log file beginning on first day of every week
yyyy-MM-dd Roll over daily
yyyy-MM-dd-a Roll over on midday and midnight
yyyy-MM-dd-HH Roll over every hour
yyyy-MM-dd-HH-mm Roll over every minute
That would give a config of:
log4j.rootLogger=INFO, fileLogger
log4j.appender.fileLogger.layout=org.apache.log4j.PatternLayout
log4j.appender.fileLogger.layout.ConversionPattern=%d [%t] %-5p (%F:%L) - %m%n
log4j.appender.fileLogger.File=example.log
log4j.appender.fileLogger=org.apache.log4j.DailyRollingFileAppender
log4j.appender.fileLogger.datePattern='.'yyyy-MM-dd-HH-mm
Related
I recently did analysis on a static log file with Spark SQL (find out stuff like the ip addresses which appear more than ten times). The problem was from this site. But I used my own implementation for it. I read the log into an RDD, turned that RDD to a DataFrame (with the help of a POJO) and used DataFrame operations.
Now I'm supposed to do a similar analysis using Spark Streaming for a streaming log file for a window of 30 mins as well as aggregated results for a day. The solution can again be found here but I want to do it another way. So what I've done is this
Use Flume to write data from the log file to an HDFS directory
Use JavaDStream to read the .txt files from HDFS
Then I can't figure out how to proceed. Here's the code I use
Long slide = 10000L; //new batch every 10 seconds
Long window = 1800000L; //30 mins
SparkConf conf = new SparkConf().setAppName("StreamLogAnalyzer");
JavaStreamingContext streamingContext = new JavaStreamingContext(conf, new Duration(slide));
JavaDStream<String> dStream = streamingContext.textFileStream(hdfsPath).window(new Duration(window), new Duration(slide));
Now I can't seem to decide if I should turn each batch to a DataFrame and do what I previously did with the static log file. Or is this way time consuming and overkill.
I'm an absolute noob to Streaming as well as Flume. Could someone please guide me with this?
Using DataFrame (and Dataset) in Spark is most promoted way in latest versions of Spark, so it's a right choice to go with. I think that some obscurity appears because of non-explicit nature of stream, when you move files into HDFS rather than read from any event log.
Main point here is to choose correct batch time size (or slide size as in your snippet), so application would process data it loaded under that time slot and there would not be batch queue.
We have 37 Informatica Sessions in which most of the Sessions have around 25 tables on average. Few sessions have 1 table as source and target. Our Source is Oracle and target is Greenplum database. We are using Powerexchange 10.1 installed on Oracle to fetch our Changed records.
We have noticed that for the sessions having more tables it is taking more time to fetch the data and update in target. Does adding more tables make any delay in Processing? In that case How to tune to fetch the records as fast as possible?
We run 19 CDC mappings with between 17 and 90 tables in each, and have recently had a breakthrough in performance. The number of tables is not the most significant limiting factor for us, power center and power exchange is. Our source is DB2 on z/OS, but that is probably not important ...
This is what we did:
1) we increased the DTM buffer block-size to 256KB, and DTM buffer size to 1GB or more, a 'complex' mapping needs many buffer blocks.
2) we change the connection attributes to:
- Realtime flush latency=86000 (max setting)
- Commit-size in session were set extremely high (to allow the above setting to be the deciding factor)
- OUW count=-1 (Same reason as above)
- maximum rows per commit=0
- minimum rows per commit=0
3) we set the session property 'recovery strategy' to 'fail task and continue workflow' and implemented our own solution to create a 'restart token file' from scratch every time the workflow starts.
Only slightly off topic: The way we implemented this was with an extra table (we call it a SYNC table) containing one row only. That row is being updated every 10 minutes on the source by very a reliable scheduled process (a small CICS program). The content of this table is written to the target database once per workflow and an extra column is added in the mapping, that contains the content of $$PMWorkflowName. Apart from the workflowname column, the two DTL__Restart1 and *2 columns is written to the target as well.
During startup of the workflow we run a small reusable session before the actual CDC session which reads the record for the current workflow from the SYNC table on the target side and creates the RESTART file from scratch.
[please note that you will end up with dublicates from up to 10 minutes (from workflow start time) in the target. We accept that and are aggregating it away in all mappings reading from these]
Try to tinker with combinations of these and tell what you experience. We now have a maximum throughput in a 10 minute interval of 10-100 million rows per mapping. Our target is Netezza (aka PDA from IBM)
One more thing I can tell you:
Every time a commit is triggered (each 86 seconds with the above settings) power center will empty all its writer buffers against all of the tables in one big commit scope. If either of these is locked by another process, you may end up with a lot of cascaded locking on the writer side, which will make the CDC seem slow.
I had config rotate log daily for apache.
When new day comes, example 00:00AM today (07/31/2017), new access.log file created, old access.log file changed to access.log-31072017
The problem here, tomorrow, access.log file will change to access.log-01082017 (yes), new access.log will create (yes), but access.log-31072017 file lost (ouch).
And, I performed:
vi /etc/logrotate.d/httpd
Insert end of file
/home/*/logs/*log{
missingok
notifempty
sharedscripts
delaycompress
postrotate
/bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true
endscript
}
Rotate config
vi /etc/logrotate.conf
Change weekly to daily
Change rotate 4 to rotate 1
The Log file is recorded at the /home/example.com/logs/ path
How to retain the files of the previous days
Thank advance
Try changing the value rotate in /etc/logrotate.conf back to 4. Despite the comment in logrotate.conf it is not the number of weeks the logs are kept but the number of times the files are rotated before they are deleted.
The manpage for logrotate.conf explains this more clearly:
rotate count
Log files are rotated count times before being removed or mailed to the address specified in a mail directive. If count is 0, old versions are removed
rather than rotated. Default is 0.
Setting it to 4 should keep the old logs for four days.
I'm trying to make a daily build machine using EC2 and store the daily releases in S3.
The releases are complete disk images so they are very bloated(300+MB total, 95% OS kernel/RFS/libraries, 5% actual software). And they change very little across time.
Ideally, with good compression, the storage cost should be close to O(t), t for time.
But if I simply add those files to S3 every day, with version number as part of file name, or with the same file name each time but with the S3 bucket versioned, the cost would be O(t^2).
Because according to this, all versions takes space and I'm charged for the space a new version takes ever since a new version is created.
Glacier is cheaper but still O(t^2).
Any suggestions?
Basically what you're looking for is an incremental file-level backup. (i.e. only backup things that change) and rebuild the current state by using a full backup and applying the deltas (i.e. increments).
If you need to use the latest image you probably need to do incremental + keep latest image. You also probably want to do full backups from time to time to reduce the time it takes to rebuild from incremental (and you are going to need to keep some sort of metadata associated with the backups).
So to sum it up: what you are describing is possible, you just need to do extra work apart from just pushing the image. Presumably you have a build process that generates the image an the extra steps can be inserted between generation and upload. The restore process is going to be more complicated than currently.
To get you started look at binary diff tools like bsdiff/bspatch or xdelta. You could generate the delta and back up only the delta. The image is also compressed so if you diff the compressed versions you will not get very far, so you probably want to diff the uncompressed file. Another way to look at it is to do the diff before generating an image and picking up only files that changed (probably more complex)
I am facing the problem in doing the settings in
logrotate.conf
I had done the settings once, but it didn't work accordingly.
The main condition is to rotate the log files along with compression at the interval of 5 days
/var/log/humble/access.log
{
daily
copytruncate
rotate 5
create 755 humble humble
dateext
compress
include /etc/logrotate.d/humble/
}
Even after doing this, compression was stopped after days.
Your logrotate.conf file should have mention to include your file "humble":-
include /etc/logrotate.d/humble
#End of Logrotate.conf
and then in your /etc/logrotate.d/humble
/var/log/humble/access.log
{
daily
copytruncate
rotate 5
create 755 humble humble
dateext
compress
}
The number specified after rotate gives how many files to be kept for backup after rotation. Here it is 5.
Also, you need to add a rule in the crontab file for triggering the logrotate every 5 days.
Rule in crontab file for running it every 5 days is :-
0 0 */5 * *