I'm calling ntp_gettime() and it is performing as expected however if I kill ntpd I'm still getting the correct behaviour with my return value showing no issues. This suggests ntp_gettime() does not call through to ntpd, which is what I believe was happening.
I'm trying to check that ntpd is still running correctly and that it still has a valid connection. I now assume that ntpd updates the system based on the defined interval and the ntp_gettime() call is calling just the system.
My question is can ntp_gettime() be used to determine if ntpd is running and that the server connection is still valid and have I just made a mistake somewhere?
If not, is there a way to do this?
The answer to your question is "no". This is not the interface to the NTP server, and cannot be used to determine the status of the NTP server. This is something completely different. According to the glibc documentation:
The ntp_gettime and ntp_adjtime functions provide an interface to
monitor and manipulate the system clock to maintain high accuracy
time. For example, you can fine tune the speed of the clock or
synchronize it with another time source.
A typical use of these functions is by a server implementing the
Network Time Protocol to synchronize the clocks of multiple systems
and high precision clocks.
In other words, this is what the NTP server itself uses to talk to the kernel, to take care of business.
Checking the daemon operation
ntpctl is a program to display information about the running ntpd daemon, the below command will give you info about how many servers the daemon is syncing with, the status of the system clock if it's synced/unsynced and the clock offset. For further information see the manpage of ntpctl https://man.openbsd.org/ntpctl. Openntpd package brings you the ntpctl program.
ntpctl -s all
The way I ended up doing it is by using a pipe to make a ntpstat call and processing the output.
That is I check for unsynchronised, synchronised, and the absence of synchronised (after checking for unsynchronised) and act accordingly based on the results above.
If there is a better way to get the same result, because I think this is messy, please let me know.
Related
I have a kind of a complex application, which uses heavy IO: it includes ffmpeg and doing video transcoding (by software, no HW-acceleration is available).
This might be unimportant, but I wanted to emphasize it.
All video transcoding functions are working on their own std::thread, and using libev for IO management.
Hardware details:
CPU architecture is ARM
OS is Debian.
Now, I'm trying to use ping to check if a particular server is available. I found ping's open source, and included in the application. This is running an a completely different std::thread.
Problem:
If there is no video-transcoding in progress, ping function works as intended.
However, when there is a CPU-intensive transcoding, ping function 99% returns with timeout.
I suspect there is some kind of IO starving, so I deep dived into pings source:
I found out that ping uses the old select function call to detect if there is an I/O available. I was almost sure this causing the problem, and I have refactored the code: dropped select, and put libev into action.
But unfortunately the starving still stays the same.
I have almost accepted this, as the video transcoding really puts a huge load onto the CPU (70-80%).
But, if I run my application from an SSH session #A, and run ping from another SSH session #B, my application can do the transcoding, and there is no singe lost packet from ping on session #B.
Conclusion:
The hardware seems capable of running my heavy application in parallel with ping.
Question:
I am really curious of:
Is there a limitation per process on Linux on how many IOs can the process use? (I guess, but how can I know this limitation? How can I raise this limitation? How can I check current usage?)
If there is not any problems with IO limitation, what else can cause this kind of "starving-like" problems between std::threads? (Anyway, pings thread seems not blocking, as it receives the timeout. It just does not receive a free IO operation) More accurate: ping actually CAN send out packets, but they just do not seem arriving back. And I am almost sure those ping packets replys are coming back, just my application is not getting a green light for READ operation.
How to execute some script (in my case it would script which copies logs to flash or copies logs remotely) before watchdog execution?
Should I modify linux kernel watchdog driver? If so in which method?
Or maybe it is possible somehow to configure this by:
/etc/default/watchdog
/etc/watchdog.conf
However we have busybox installed where watchdog configuration is limited.
I cannot find anything on google, what is suprised as this is basic problem which needs to be solved - everybody wants to have logs after watchdog reset in persistent memory, flash what is not /var/log/ path.
Of course solution to copy from time to time logs to flash in normal device lifecycle is not good idea as there should be some solution how to do this when watchdog timeout on feeding /dev/watchdog expires.
On a linux kernel newer than 4.9 you should have the availability of the pretimeout governor framework which would allow you to write a linux kernel driver which would react on the detection of a pre-timeout. A solution like this is well beyond the scope of a simple question and answer, so I'm leaving my original answer stand.
TL;DR:
If the problem is detectable while the OS is still running you can flush the logs. If the problem is caused by the OS locking up then you won't have an opportunity to fix the issue as hardware will reset the box.
There are two things here:
Watchdog device
Watchdog program
The watchdog device is typically a hardware timer that will do 'something specifically low level' when it's timer expires. The most common low level thing to do is reset the box. There is no OS involvement in this if it happens in hardware. You will have no opportunity to do anything high level once that timer runs out - e.g. writing log files somewhere.
The watchdog program is a tool that reassures the watchdog device periodically as long as it's check conditions are met.
The busybox watchdog timer's condition is a simple loop (pseudo code):
while (1) {
# reassure watchdog
# sleep some time
}
so if the program stops running - e.g. by an OS lockup or termination of the program then the underlying hardware will simply kick the box.
The 'bigger' watchdog binary provides a bunch of checks, and if they fail, then it will trigger the repair-binary option in the /etc/watchdog.conf to try to recover. This would be a potential point to flush the logs.
The Zookeeper Watches documentation states:
"A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode." Furthermore, "Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper."
The point is, there is no guarantee you'll get a watch notification.
This is important, because in a sytem like Clojure's Avout, you're trying to mimic Clojure's Software Transactional Memory, over the network using Zookeeper. This relies on there being a watch notification for every change.
Now I'm trying to work out if this is a coding flaw, or a fundamental computer science problem, (ie the CAP Theorem).
My question is: Does the Zookeeper Watches system have a bug, or is this a limitation of the CAP theorem?
This seems to be a limitation in the way ZooKeeper implements watches, not a limitation of the CAP theorem. There is an open feature request to add continuous watch to ZooKeeper: https://issues.apache.org/jira/browse/ZOOKEEPER-1416.
etcd has a watch function that uses long polling. The limitation here which you need to account for is that multiple events may happen between receiving the first long poll result, and re-polling. This is roughly analogous to the issue with ZooKeeper. However they have a solution:
However, the watch command can do more than this. Using the index [passing the last index we've seen], we can watch for commands that have happened in the past. This is useful for ensuring you don't miss events between watch commands.
curl -L 'http://127.0.0.1:4001/v2/keys/foo?wait=true&waitIndex=7'
Got a large C++ function in Linux that calls a whole lot of other functions, making up an algorithm. At various points given certain bad inputs, the algorithm can get "stuck" and go on forever. Adding a timeout seems appropriate as all potential "stuck" points cannot be predicted. But despite scouring the Internet for timeout examples I've only found how to apply timeouts when either the thing your timing is a separate thread or it's reading inputs. My code is a single thread and does not modify file descriptors, so not coming up with any luck. Do I basically have no choice but to thread it?
I am not sure about the situation, actually server applications or embedded applications often run for years in background without stopping. I think one option is to let your program run in background and log to a file(or screen) timely, and, if you really want to stop the program after certain time, you can use timeout command or a script to kill your program after that time, say, timeout 15s your-prog.
I have two applications running on my machine. One is supposed to hand in the work and other is supposed to do the work. How can I make sure that the first application/process is in wait state. I can verify via the resources its consuming, but that does not guarantee so. What tools should I use?
Your 2 applications shoud communicate. There are a lot of ways to do that:
Send messages through sockets. This way the 2 processes can run on different machines if you use normal network sockets instead of local ones.
If you are using C you can use semaphores with semget/semop/semctl. There should be interfaces for that in other languages.
Named pipes block until there is both a read and a write operation in progress. You can use that for synchronisation.
Signals are also good for this. In C it is called sendmsg/recvmsg.
DBUS can also be used and has bindings for variuos languages.
Update: If you can't modify the processing application then it is harder. You have to rely on some signs that indicate the progress. (I am assuming you processing application reads a file, does some processing then writes the result to an output file.) Do you know the final size the result should be? If so you need to check the size repeatedly (or whenever it changes).
If you don't know the size but you know how the processing works you may be able to use that. For example the processing is done when the output file is closed. You can use strace to see all the system calls including the close. You can replace the close() function with the LD_PRELOAD environment variable (on windows you have to replace dlls). This way you can sort of modify the processing program without actually recompiling or even having access to its source.
you can use named pipes - the first app will read from it but it will be blank and hence it will keep waiting (blocked). The second app will write into it when it wants the first one to continue.
Nothing can guarantee that your application is in waiting state. You have to pass it some work and get back a response. It might be transactions or not - application can confirm that it got the message to process before it starts to process it or after it was processed (successfully or not). If it does not wait, passing a piece of work should fail. Whether when trying to write to a TCP/IP socket or other means, or if timeout occurs. This depends on implementation, what kind of transport you are using and other requirements.
There is actually a way of figuring out if the process (thread) is in blocking state and waiting for data on a socket (or other source), but that means that client should be on the same computer and have access privileges required to do that, but that makes no sense other than debugging, which you can do using any debugger anyway.
Overall, the idea of making sure that application is waiting for data before trying to pass it that data smells bad. Not to mention the racing condition - what if you checked and it was OK, and when you actually tried to send the data, you found out that application is not waiting at that time (even if that is microseconds).