Centos 7 syslog rate limiting strangeness

Centos 7 syslog rate limiting strangeness - centos7

I'm seeing some confusing results of syslog rate limiting on Centos 7.
With this release it seems that rsyslogd is getting its input from journald. Which means that the first place syslog messages are rate limited is by journald.
In journald.conf I can adjust the rate limit with these two lines:
RateLimitInterval=60s
RateLimitBurst=100
Setting RateLimitInterval to 0 seems to properly disable it and no rate limit is in place.
What is really strange is that any other values work not quite like you expect. If I sit there trying to flood syslog with perhaps 1000 lines/minute, the above setting would seem to limit me to actually logging 100/minute. But what happens? I actually log 275 lines/minute.
If I change the RateLimitBurst value to 1000, instead of logging 1000/minute, I log about 2750.
No matter what I do, I get an actual log rate of about 2.75 times greater than expected.
It would seem that something in journald is maybe doing some incorrect time conversion.
This is easy to reproduce, 100% on any of the Centos 7.2 systems I am using.
After messages are limited by journald they are further limited by the imjournal setting in rsyslogd, but I don't think that is coming into play here.
Anyone else see this? It matters because I have an app that is a little noisy, spewing quite a few messages which are diverted to a specific file, and I need to adjust the limit to a higher value. But I want it to be done in a sensible way.

Related

Intercept another window's output

I have the project to develop an application that would allow a computer to 'send' a window to another computer.
In order to do that, I need, of course, to capture the concerned window's output from my program.
Google searches leaded me to no relevant result, neither with libX11 nor libxcb.
I also tried to record screenshots with xwd and import, but as they are quite slow, I'm getting up to 3.5 fps
Any help on how I could do that will be welcome (either using libX11, libxcb, or something else)
By the way, I attempt to use c++ for this program
Thanks for reading,
Edit:
The fps test was made without sending files. It was just like "I took screenshots for 5 minutes, and I got 900 pictures"

I think you will need to record screenshots and compress them before sending over network to make things faster. Also, you would need to decrease the quality of the screenshots (user configurable) to make it faster.
Plus there are different techniques to send only the changes (diff of screenshots) to the other computer.

Windows notifications when hibernation fails

I'm coding using C++ with WinAPIs, and to hibernate the computer I use the following call:
SetSuspendState(TRUE, NULL, FALSE);
But what happens is that if the computer has larger RAM array installed the hibernation fails.
So I was wondering, does Windows send any notifications if hibernation fails? And if not, how to tell if my request to hibernate failed?

Looks like there's no direct way to detect hibernation [CORRECTION: I was wrong about this. See Fowl's answer.] until Windows 8 (see PowerRegisterSuspendResumeNotification). But I suppose you could idle-loop and watch the system time. If the time suddenly jumps forwards, you've successfully hibernated (and resumed!) and if this hasn't happened within a minute or so the request probably failed. I think you can use the GetTickCount64 function, which is insensitive to system time changes but apparently includes a bias for time spent sleeping. If this doesn't work, use GetSystemTimeAsFileTime but also watch for WM_TIMECHANGE messages.
You could also check on the system in question whether Windows writes anything to the event log when hibernation fails. If so, your application could monitor the event log for the relevant entry. This would be a more reliable approach.

Register for (RegisterPowerSettingNotification), then listen for WM_POWERBROADCAST, and then interrogate the event log to get more detail.
There's a bit of messing around if you want to handle multiple OS versions, but it's doable.

Hm... maybe I'm missing the point here, but according to the docs it should return FALSE if it failed. Does it still return TRUE in your case?

DirectInput8 EnumDevices sometimes painfully slow

Sometimes (in about 50% of runs), EnumDevices takes 5-10 seconds to return. Normally it is almost instant. I couldn't find any other reports of this kind of behaviour.
When things are this slow, it's ok to profile by watching stdout :) This:
std::cout << "A";
directInput8Interface->EnumDevices(DI8DEVCLASS_GAMECTRL, MyCallback, NULL, DIEDFL_ATTACHEDONLY);
std::cout << "C";
...
BOOL CALLBACK MyCallback(LPCDIDEVICEINSTANCE, LPVOID)
{
std::cout << "B";
return DIENUM_CONTINUE;
}
Seems to hang at a random point through enumerating devices - sometimes it'll be before the callback is called at all, sometimes after a couple, and sometimes it will be after the last call to it.
This is clearly a simplified chunk of code; I'm actually using the OIS input library ( http://sourceforge.net/projects/wgois/ ), so for context, please see the full source here:
http://wgois.svn.sourceforge.net/viewvc/wgois/ois/trunk/src/win32/Win32InputManager.cpp?revision=39&view=markup
There doesn't seem to be anything particularly fruity going on there though, but possibly something in their initialisation could be the cause - I don't know enough about DI8 to spot it.
Any ideas about why it could be so slow will be greatly appreciated!
EDIT:
I've managed to catch the hang in an etl trace file and analysed it in Windows Performance Analyzer. It looks like EnumDevices eventually calls through to DInput8.dll!fGetProductStringFromDevice, which calls HIDUSB.SYS!HumCallUSB, which calls KeWaitForSingleObject and waits. 9 times out of 10 (literally - there are 10 samples in the trace) this returns very quickly (324us each), with the readying callstack containing usbport.sys!USBPORT_Core_iCompleteDoneTransfer followed by HIDUSB.SYS!HumCallUsbComplete, which looks quite normal.
But 1 time in 10, this takes almost exactly 5 seconds to return. On the readying callstack is ntkrnlmp.exe!KiTimerExpiration instead of the HIDUSB.SYS function. I guess all this indicates that the HIDUSB.SYS driver is querying devices asynchronously with a timeout of 5 seconds, and sometimes it fails and hits this timeout.
I don't know whether this failure is associated with any one device in particular (I do have a few USB HIDs) or if it's random - it's hard to test because it doesn't always happen. Again, any information anyone can give me will be appreciated, though I don't hold out any hope for Microsoft fixing this any time soon given the odd situation DirectInput is in!
Perhaps I'll just have to start initialising input earlier, asynchronously, and accept that sometimes there'll be a 5 second delay before user input can happen.

I was running into this too, largely as an end user, but it's been annoying the hell out of me for years. I didn't realize it was this issue until I ran into it on an open source project and was able to debug it.
Turns out it was my USB Headphone DAC (The Objective DAC from Massdrop), it installs the driver: wdma_usb.inf_amd64_134cb113911feba4\wdma_usb.inf for Device Instance ID USB\VID_262A&PID_1048&MI_01\7&F217D4F&0&0001 and then shows up in Device Manager under Sound, video and game controllers as: ODAC-revB USB DAC and, under Human Interface Devices as: USB Input Device and HID-compliant consumer control device.
I have no idea what the HID entries do but... When they are enabled and this DAC is set as the Audio Output device both IDirectInput8_CreateDevice and EnumDevices are painfully slow. Disabling the "USB Input Device" entry seems to cause no negative effects and completely solves my issue.
Changing the Audio output from the DAC to anything else also weirdly solved the issue.
This was so bad that it made the Gamepad Configuration dialog joy.cpl unusable, hanging and eventually crashing.
I was wanting this to just be a comment but I don't have enough rep for it, and this is pretty much the only place on the internet that describes this problem though so hopefully this helps someone else one day!

I had the same issue. I have a Corsair K65 LUX RGB keyboard. I updated CUE and it seems to have fixed the issue

Got same issue when having my Corsair K55 Keyboard. Changing the keyboard of USB port fixes the issue for a while, but then it comes back later on. So it seems to be a buggy drivers issue.

As DaFox has pointed out, a likely cause appears to be certain device drivers being enabled. I contacted JDS Labs support (who sell one device which happens to install one such driver) and they kindly pointed out that the root cause is actually a bug within Windows (not the installed driver), and they actually provide the solution on their troubleshooting page. See Games hang or experience loading delays, which explicitly mentions VID_262. Disabling this driver fixes the issue without apparent side effects (under the condition that that is the only driver triggering the bug). As for what exactly is going wrong within Windows, here there be dragons.
So I guess the go-to solution (for users) is to scrape all the troubleshooting and FAQ pages for all devices which you have ever connected to your system and see if there is a mention of delays/lag caused by a driver.
As a software developer, you will probably want to benchmark the execution time of the affected code and kindly tell the user there is something wrong with their system configuration and where to look for how to fix it in case it is unreasonably long.

Same issue with Corsair K70 Keyboard.
Quickly reconnecting keyboard fixes this, until next time. Usually happens after some DirectInput devices removed from the system or go to sleep.

This has been plaguing me as a developer and my friend as a user for years. All games using DInput, SDL SDL_INIT_JOYSTICK or anything depending on that, took extremely long to initialize.
It was caused by a faulty driver of a DAC, and as pointed out by DaFox, disabling the corresponding USB Input Device resolved the issue. Although it's labeled with a different manufacturer name, the vendor IDs match.
The hardware ID of the device is USB\VID_262A&PID_9023&REV_0001&MI_00.

Same issue appears to happen with a Steelseries Apex 7 keyboard. Unplugging and plugging that keyboard back again got rid of 3 freezes (of 10 seconds each) while enumerating USB devices.

What is your strategy to write logs in your software to deal with possible HUGE amount of log messages?

Thanks for your time and sorry for this long message!
My work environment
Linux C/C++(but I'm new to Linux platform)
My question in brief
In the software I'm working on we write a LOT of log messages to local files which make the file size grow fast and finally use up all the disk space(ouch!). We want these log messages for trouble-shooting purpose, especially after the software is released to the customer site. I believe it's of course unacceptable to take up all the disk space of the customer's computer, but I have no good idea how to handle this. So I'm wondering if somebody has any good idea here. More info goes below.
What I am NOT asking
1). I'm NOT asking for a recommended C++ log library. We wrote a logger ourselves.
2). I'm NOT asking about what details(such as time stamp, thread ID, function name, etc) should be written in a log message. Some suggestions can be found here.
What I have done in my software
I separate the log messages into 3 categories:
SYSTEM: Only log the important steps in my software. Example: an outer invocation to the interface method of my software. The idea behind is from these messages we could see what is generally happening in the software. There aren't many such messages.
ERROR: Only log the error situations, such as an ID is not found. There usually aren't many such messages.
INFO: Log the detailed steps running inside my software. For example, when an interface method is called, a SYSTEM log message is written as mentioned above, and the entire calling routine into the internal modules within the interface method will be recorded with INFO messages. The idea behind is these messages could help us identify the detailed call stack for trouble-shooting or debugging. This is the source of the use-up-disk-space issue: There are always SO MANY INFO messages when the software is running normally.
My tries and thoughts
1). I tried to not record any INFO log messages. This resolves the disk space issue but I also lose a lot of information for debugging. Think about this: My customer is in a different city and it's expensive to go there often. Besides, they use an intranet that is 100% inaccessible from outside. Therefore: we can't always send engineers on-site as soon as they meet problems; we can't start a remote debug session. Thus log files, I think, are the only way we could make use to figure out the root of the trouble.
2). Maybe I could make the logging strategy configurable at run-time(currently it's before the software runs), that is: At normal run-time, the software only records SYSTEM and ERROR logs; when a problem arises, somebody could change the logging configuration so the INFO messages could be logged. But still: Who could change the configuration at run-time? Maybe we should educate the software admin?
3). Maybe I could always turn the INFO message logging on but pack the log files into a compressed package periodically? Hmm...
Finally...
What is your experience in your projects/work? Any thoughts/ideas/comments are welcome!
EDIT
THANKS for all your effort!!! Here is a summary of the key points from all the replies below(and I'll give them a try):
1). Do not use large log files. Use relatively small ones.
2). Deal with the oldest ones periodically(Either delete them or zip and put them to a larger storage).
3). Implement run-time configurable logging strategy.

There are two important things to take note of:
Extremely large files are unwieldy. They are hard to transmit, hard to investigate, ...
Log files are mostly text, and text is compressible
In my experience, a simple way to deal with this is:
Only write small files: start a new file for a new session or when the current file grows past a preset limit (I have found 50 MB to be quite effective). To help locate the file in which the logs have been written, make the date and time of creation part of the file name.
Compress the logs, either offline (once the file is finished) or online (on the fly).
Put up a cleaning routine in place, delete all files older than X days or whenever you reach more than 10, 20 or 50 files, delete the oldest.
If you wish to keep the System and Error logs longer, you might duplicate them in a specific rotating file that only track them.
Put altogether, this gives the following log folder:
Log/
info.120229.081643.log.gz // <-- older file (to be purged soon)
info.120306.080423.log // <-- complete (50 MB) file started at log in
(to be compressed soon)
info.120306.131743.log // <-- current file
mon.120102.080417.log.gz // <-- older mon file
mon.120229.081643.log.gz // <-- older mon file
mon.120306.080423.log // <-- current mon file (System + Error only)
Depending on whether you can schedule (cron) the cleanup task, you may simply spin up a thread for cleanup within your application. Whether you go with a purge date or a number of files limit is a choice you have to make, either is effective.
Note: from experience, a 50MB ends up weighing around 10MB when compressed on the fly and less than 5MB when compressed offline (on the fly is less efficient).

Your (3) is standard practice in the world of UNIX system logging.
When log file reaches a certain age or maximum size, start a new one
Zip or otherwise compress the old one
throw away the nth oldest compressed log

One way to deal with it is to rotate log files.
Start logging into a new file once you reach certain size and keep last couple of log files before you start overwriting the first one.
You will not have all possible info but you will have at least some stuff leading up to the issue.
The logging strategy sounds unusual but you have your reasons.

I would
a) Make the level of detail in the log messages configurable at run time.
b) Create a new log file for each day. You can then get cron to either compress them and/or delete them or perhaps transfer to off-ling storage.

My answer is to write long logs and then tweat out the info you want.
Compress them on a daily basis - but keep them for a week

I like to log a lot. In some programs I've kept the last n lines in memory and written to disk in case of an error or the user requesting support.
In one program it would keep the last 400 lines in memory and save this to a logging database upon an error. A separate service monitored this database and sent a HTTP request containing summary information to a service at our office which added this to a database there.
We had a program on each of our desktop machines that showed a list (updated by F5) of issues, which we could assign to ourselves and mark as processed. But now I'm getting carried away :)
This worked very well to help us support many users at several customers. If an error occurred on a PDA somewhere running our software then within a minute or so we'd get a new item on our screens. We'd often phone a user before they realised they had a problem.
We had a filtering mechanism to automatically process or assign issues that we knew we'd fixed or didn't care much about.
In other programs I've had hourly or daily files which are deleted after n days either by the program itself or by a dedicated log cleaning service.

Running out of file descriptors for mmaped files despite high limits in multithreaded web-app

I have an application that mmaps a large number of files. 3000+ or so. It also uses about 75 worker threads. The application is written in a mix of Java and C++, with the Java server code calling out to C++ via JNI.
It frequently, though not predictably, runs out of file descriptors. I have upped the limits in /etc/security/limits.conf to:
* hard nofile 131072
/proc/sys/fs/file-max is 101752. The system is a Linode VPS running Ubuntu 8.04 LTS with kernel 2.6.35.4.
Opens fail from both the Java and C++ bits of the code after a certain point. Netstat doesn't show a large number of open sockets ("netstat -n | wc -l" is under 500). The number of open files in either lsof or /proc/{pid}/fd are the about expected 2000-5000.
This has had me grasping at straws for a few weeks (not constantly, but in flashes of fear and loathing every time I start getting notifications of things going boom).
There are a couple other loose threads that have me wondering if they offer any insight:
Since the process has about 75 threads, if the mmaped files were somehow taking up one file descriptor per thread, then the numbers add up. That said, doing a recursive count on the things in /proc/{pid}/tasks/*/fd currently lists 215575 fds, so it would seem that it should be already hitting the limits and it's not, so that seems unlikely.
Apache + Passenger are also running on the same box, and come in second for the largest number of file descriptors, but even with children none of those processes weigh in at over 10k descriptors.
I'm unsure where to go from there. Obviously something's making the app hit its limits, but I'm completely blank for what to check next. Any thoughts?

So, from all I can tell, this appears to have been an issue specific to Ubuntu 8.04. After upgrading to 10.04, after one month, there hasn't been a single instance of this problem. The configuration didn't change, so I'm lead to believe that this must have been a kernel bug.

your setup uses a huge chunk of code that may be guilty of leaking too; the JVM. Maybe you can switch between the sun and the opensource jvms as a way to check if that code is not by chance guilty. Also there are different garbage collector strategies available for the jvm. Using a different one or different sizes will cause more or less garbage collects (which in java includes the closing of a descriptor).
I know its kinda far fetched, but it seems like all the other options you already followed ;)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js