I have a C++ script, which checks whether any action has to be done and if so it starts the right processor C++ script. However, since it runs every x minutes it also checks whether the processor isn't still running using lock files.
I use the following function to acquire the lock:
int LockFile(string FileNameToLock) {
FileNameToLock += ".lock";
int fd = open(FileNameToLock.c_str(), O_RDWR | O_CREAT, 0666);
int rc = flock(fd, LOCK_EX | LOCK_NB);
if (rc || rc == -1) {
cout << errno << endl;
cout << strerror(errno) << endl;
return -1;
}
return fd;
}
The code that is being executed:
[...]
if (LockFile(ExecuteFileName, Extra) == -1) {
cout << "Already running!" << endl; //MAIN IS ALREADY RUNNING
//RETURNS ME Resource temporarily unavailable when processor is running from an earlier run
exit(EXIT_SUCCESS);
}
if (StartProcessor) { //PSEUDO
int LockFileProcessor = LockFile("Processor");
if (LockFileProcessor != -1) {
string Command = "nohup setsid ./Processor"; //NOHUP CREATES ZOMBIE?
Command += IntToString(result->getInt("Klantnummer"));
Command += " > /dev/null 2>&1 &"; //DISCARD OUTPUT
system(Command.c_str());
//STARTS PROCESSOR (AS ZOMBIE?)
}
}
The first run works well, however when the main script runs again, the lock file returns -1, which means an error occurred (only when the processor is still running). The errno is 11 which results in the error message: Resource temporarily unavailable. Note that this only happens when the (zombie) processor is still running. (However, the main script has already terminated, which should close the file handle?)
For some reason, the zombie keeps the file handle to the lock file of the main script open???
I have no idea where to look for this problem.
SOLVED:
see my answer
No, 11 is EAGAIN/EWOULDBLOCK which simply means that you cannot acquire the lock because the resource is already locked (see the documentation). You received that error (instead of blocking behaviour) due to LOCK_NB flag.
EDIT: After some discussion it seems that the problem is due to flock() locks being preserved when subprocessing. To avoid this issue I recommend not using flock() for the lifetime but instead touch-and-delete-at-exit strategy:
If file.lock exists then exit
Otherwise create file.lock and start processing
Delete file.lock at exit.
Of course there's a race condition here. In order to make it safe you would need another file with flock:
flock(common.flock)
If file.lock exists then exit
Otherwise create file.lock
Unlock flock(common.flock)
Start processing
Delete file.lock at exit
But this only matters if you expect simultaneous calls to main. If you don't (you said that a cron starts the process every 10min, no race here) then stick to the first version.
Side note: here's a simple implementation of such (non-synchronized) file lock:
#include <string>
#include <fstream>
#include <stdexcept>
#include <cstdio>
// for sleep only
#include <chrono>
#include <thread>
class FileLock {
public:
FileLock(const std::string& path) : path { path } {
if (std::ifstream{ path }) {
// You may want to use a custom exception here
throw std::runtime_error("Already locked.");
}
std::ofstream file{ path };
};
~FileLock() {
std::remove(path.c_str());
};
private:
std::string path;
};
int main() {
// This will throw std::runtime_error if test.xxx exists
FileLock fl { "./test.xxx" };
std::this_thread::sleep_for(std::chrono::seconds { 5 });
// RAII: no need to delete anything here
return 0;
};
Requires C++11. Note that this implementation is not race-condition-safe, i.e. you generally need to flock() the constructor but in this situation it probably be fine (i.e. when you don't start main in parallel).
I solved this issue, since the system and fork commands seem to pass on the flock, by saving the command to execute in a vector. Unlock the lock file right before looping the Commands vector and execute each command. This leaves the main with a very tiny gap of running while not locked, but for now it seems to work great. This also supports ungraceful terminations.
[...]
if (LockFile(ExecuteFileName, Extra) == -1) {
cout << "Already running!" << endl; //MAIN IS ALREADY RUNNING
//RETURNS ME Resource temporarily unavailable when processor is running from an earlier run
exit(EXIT_SUCCESS);
}
vector<string> Commands;
if (StartProcessor) { //PSEUDO
int LockFileProcessor = LockFile("Processor");
if (LockFileProcessor != -1) {
string Command = "nohup setsid ./Processor"; //NOHUP CREATES ZOMBIE
Command += IntToString(result->getInt("Klantnummer"));
Command += " > /dev/null 2>&1 &"; //DISCARD OUTPUT
Commands.push_back(Command);
}
}
//UNLOCK MAIN
if (UnlockFile(LockFileMain)) {
for(auto const& Command: Commands) {
system(Command.c_str());
}
}
Related
I'm writing unit tests for a function that may lock a file (using fcntl(fd, F_SETLK, ...)) under some conditions.
I want my unit test to be able to EXPECT that the file is or is not locked at certain points. But I can't find any way to test that. I've tried using F_GETLK, but it will only tell you if the lock would not be able to be placed. Since a given process can re-lock the same file as often as it wants, F_GETLK is returning F_UNLCK, indicating the file is unlocked.
For instance, if I run the following little program:
int main(int argc, char** argv) {
int fd = open("/tmp/my_test_file", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
if (fd < 0) {
return EXIT_FAILURE;
}
// Initial lock
struct flock lock;
lock.l_type = F_WRLCK;
lock.l_whence = SEEK_SET;
lock.l_start = 0;
lock.l_len = 0; // Lock entire file.
if (fcntl(fd, F_SETLK, &lock) < 0) {
return EXIT_FAILURE;
}
// Test lock:
lock.l_type = F_WRLCK;
lock.l_pid = 0;
if (fcntl(fd, F_GETLK, &lock) < 0) {
return EXIT_FAILURE;
}
switch (lock.l_type) {
case F_WRLCK:
std::cout << lock.l_pid << " is holding the lock\n";
break;
case F_UNLCK:
std::cout << "File is unlocked\n";
break;
default:
std::cout << "Unexpected " << lock.l_type << "\n";
break;
}
return EXIT_SUCCESS;
}
It will print:
File is unlocked
So, is there a way for a process to test if it is holding an fcntl file lock?
Also, are there other kinds of (Linux-portable!) file locks I could be using that would solve my problem?
Well I am not aware of any "already available library” for this, but on implementation level I would suggest you to have a log file which keeps track of that.
You can simply create a file named “log”, mmap(2) it as MAP_SHARED into each process accessing the file and whenever a file lock succeeds, write the current process’ pid at the end of that log file maintaining offset to the end with CAS. That will help you analyse order of locks.
Maybe simply by opening file in append more and writing to the end, the pid of current process.
Or maybe a quicker way to do so such a test is to create a fifo file by mkfifo(2) and write to that file.
I don't have enough reputation points to comment to ask if they solved the problem originally stated here. I have the same problem of the program crashing in construction of an ofstream.
It is not multi-threaded access, but it can be called in quick succession. I believe it crashes on the 2nd time. I use scope to ensure the stream object is destroyed.
Why would this happen?
I tried std::fopen too. It also results in a crash.
Here is the code using the ofstream:
static bool writeConfigFile (const char * filename, const Config & cfg)
{
logsPrintLine(_SLOG_SETCODE(_SLOGC_CONFIG, 0), _SLOG_INFO, "Write config file (%s stream)", filename);
ofstream os(filename); // FIXME: Crashes here creating ofstream 2nd time
if (os.good())
{
// Uses stream insertion operator to output attributes to stream
// in human readable form (about 2kb)
outputConfig(cfg, os);
if (!os.good())
{
logsPrintLine(_SLOG_SETCODE(_SLOGC_CONFIG, 0), _SLOG_NOTICE, "Failed to write configuration file (%s)", filename);
return false;
}
logsPrintLine(_SLOG_SETCODE(_SLOGC_CONFIG, 0), _SLOG_INFO, "Configuration written to file (%s)", filename);
return true;
}
logsPrintLine(_SLOG_SETCODE(_SLOGC_CONFIG, 0), _SLOG_NOTICE, "Cannot write configuration file (%s)", filename);
return false;
}
/**
* Called when configuration settings have been read/received and validated
* #return true if successfully set, and written to file
*/
bool Config::set (SysConfigSource source, const struct SCADA_dsconfig * p)
{
Lock lock(mtxSet); // This is locking a mutex on construction of the lock. Release it on destruction.
// Setup the non-current one to switch to
Config * pCfg = pConfig.other();
unsigned i, f, n = 0;
// set attributes in pCfg based on the config received
// and some constants ...
pCfg->setWritten(writeConfigFile("test.conf", *pCfg));
if (!pCfg->isWritten())
{
// Don't set system config status here. Existing one still in use.
logsPrintLine(_SLOG_SETCODE(_SLOGC_CONFIG, 0), _SLOG_NOTICE, "Config file not written. Retain prior config.");
return false;
}
pConfig.swap(); // switch-in the new config
setSystemConfigSource(source);
toSyslog(pCfg);
notifyConfigChange();
return true;
}
Maybe post a segment of your source code in order to get an idea of where it went wrong.
Here is a very basic segment of code of how I would use fstream.. hope you will find it helpful.
#include <iostream>
#include <fstream>
#include <string>
int main() {
while (1) {
std::string testString;
std::ofstream outFile;
outFile.open("Test", std::ios_base::app); // Appends to file, does not delete existing code
std::cout << "Enter a string: ";
std::cin >> testString;
outFile << testString << std::endl;
outFile.close();
}
}
It turned out to be a device driver bus master issue. Add "ahci nobmstr" when launching devb-ahci.
Derived via http://www.qnx.com/developers/docs/qnxcar2/index.jsp?topic=%2Fcom.qnx.doc.neutrino.user_guide%2Ftopic%2Fhardware_Troubleshooting_devb-eide.html
In this thread I solved a part of my cases and thanks to NathanOliver I've got to the following code so far:
int main(){
//...
bool proc1 = false, proc2 = false, proc3 = false, proc4 = false,
while(true) {
if(!proc1 && ProcessRunning("process1.exe")){
fun1("fun1.bat");
proc1 = true;
}
if(!proc2 && ProcessRunning("process2.exe")){
fun1("fun2.bat");
proc2 = true;
}
if(!proc3 && ProcessRunning("process3.exe")){
fun1("fun3.bat");
proc3 = true;
}
if(!proc4 && ProcessRunning("process4.exe")){
fun1("fun4.bat");
proc4 = true;
}
}
return 0;
}
What I still can't get through is the case where:
double click on app1 -> process1 starts.
while process1 is running I double click on app2 so that process2 should have the same behaviour as I mentioned in my first thread:
if it finds process2 (the second if(){}), it creates that .bat file and
it executes it (kill process2(it might existed before i opened it), start > it again, delete the .bat file generated by fun2(const char name[]){}).
Summary of previous post:
int fun1(const char name[]){
ofstream file;
file.open(name, ios::out);
//start of what I write in .bat
file << "#echo off";
file << "cd to specific path";
file << "taskkill /im process.exe* /f";
file << "start process.exe";
file << "del \"%~f0\"";
file.close();
return system(name);
}
Exactly the same for the rest functions.
I believe you run the .bat file as sync program, therefore until the bat won't finish and return exit code (which you may check as return value of system function) your main program won't continue to run. You may use async process using fork on linux based systems and CreateProcess on windows OS.
How about this? (C++11 Standard)
bool proc[] = {false, false, false, false};
while(true)
{
for(int i=0 ; i<4 ; i++)
{
const std::string exeName = "process" + std::to_string(i) + ".exe";
const std::string funName = "fun" + std::to_string(i) + ".bat";
if(!proc[i] && ProcessRunning(exeName.c_str()))
{
fun1(funName.c_str());
proc[i] = true;
}
}
Should make things a bit more flexible, but getting to your problem. If-condition compares the first expression proc and then ProcessRunning(...). Both must be true in order to enter the control block.
But your file explaining fun1() seems to have an issue at least in comments. You mention for example to //taskkill process.exe but it has to be process#.
I can help you more on that if you provide the full fun1 implementation or at least the most relevant parts. Your cycle though should work. The problem has to be in fun1(...)
P.D.: I also would rename fun1(...) to something that makes more sense.
I am launching a command using system api (I am ok with using this api with C/C++). The command I pass may hang at times and hence I would like to kill after certain timeout.
Currently I am using it as:
system("COMMAND");
I want to use it something like this:
Run a command using a system independent API (I don't want to use CreateProcess since it is for Windows only) Kill the process if it does not exit after 'X' Minutes.
Since system() is a platform-specific call, there cannot be a platform-independent way of solving your problem. However, system() is a POSIX call, so if it is supported on any given platform, the rest of the POSIX API should be as well. So, one way to solve your problem is to use fork() and kill().
There is a complication in that system() invokes a shell, which will probably spawn other processes, and I presume you want to kill all of them, so one way to do that is to use a process group. The basic idea is use fork() to create another process, place it in its own process group, and kill that group if it doesn't exit after a certain time.
A simple example - the program forks; the child process sets its own process group to be the same as its process ID, and uses system() to spawn an endless loop. The parent process waits 10 seconds then kills the process group, using the negative value of the child process PID. This will kill the forked process and any children of that process (unless they have changed their process group.)
Since the parent process is in a different group, the kill() has no effect on it.
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
int main() {
pid_t pid;
pid = fork();
if(pid == 0) { // child process
setpgid(getpid(), getpid());
system("while true ; do echo xx ; sleep 5; done");
} else { // parent process
sleep(10);
printf("Sleep returned\n");
kill(-pid, SIGKILL);
printf("killed process group %d\n", pid);
}
exit(0);
}
There is no standard, cross-platform system API. The hint is that they are system APIs! We're actually "lucky" that we get system, but we don't get anything other than that.
You could try to find some third-party abstraction.
Check below C++ thread based attempt for linux. (not tested)
#include <iostream>
#include <string>
#include <thread>
#include <stdio.h>
using namespace std;
// execute system command and get output
// http://stackoverflow.com/questions/478898/how-to-execute-a-command-and-get-output-of-command-within-c
std::string exec(const char* cmd) {
FILE* pipe = popen(cmd, "r");
if (!pipe) return "ERROR";
char buffer[128];
std::string result = "";
while(!feof(pipe)) {
if(fgets(buffer, 128, pipe) != NULL)
result += buffer;
}
pclose(pipe);
return result;
}
void system_task(string& cmd){
exec(cmd.c_str());
}
int main(){
// system commad that takes time
string command = "find /";
// run the command in a separate thread
std::thread t1(system_task, std::ref(command));
// gives some time for the system task
std::this_thread::sleep_for(chrono::milliseconds(200));
// get the process id of the system task
string query_command = "pgrep -u $LOGNAME " + command;
string process_id = exec(query_command.c_str());
// kill system task
cout << "killing process " << process_id << "..." << endl;
string kill_command = "kill " + process_id;
exec(kill_command.c_str());
if (t1.joinable())
t1.join();
cout << "continue work on main thread" << endl;
return 0;
}
I had a similar problem, in a Qt/QML development: I want to start a bash command, while continuing to process events on the Qt Loop, and killing the bash command if it takes too long.
I came up with the following class that I'm sharing here (see below), in hope it may be of some use to people with a similar problem.
Instead of calling a 'kill' command, I call a cleanupCommand supplied by the developper. Example: if I'm to call myscript.sh and want to check that it won't last run for more than 10 seconds, I'll call it the following way:
SystemWithTimeout systemWithTimeout("myScript.sh", 10, "killall myScript.sh");
systemWithTimeout.start();
Code:
class SystemWithTimeout {
private:
bool m_childFinished = false ;
QString m_childCommand ;
int m_seconds ;
QString m_cleanupCmd ;
int m_period;
void startChild(void) {
int rc = system(m_childCommand.toUtf8().data());
if (rc != 0) SYSLOG(LOG_NOTICE, "Error SystemWithTimeout startChild: system returned %d", rc);
m_childFinished = true ;
}
public:
SystemWithTimeout(QString cmd, int seconds, QString cleanupCmd)
: m_childFinished {false}, m_childCommand {cmd}, m_seconds {seconds}, m_cleanupCmd {cleanupCmd}
{ m_period = 200; }
void setPeriod(int period) {m_period = period;}
void start(void) ;
};
void SystemWithTimeout::start(void)
{
m_childFinished = false ; // re-arm the boolean for 2nd and later calls to 'start'
qDebug()<<"systemWithTimeout"<<m_childCommand<<m_seconds;
QTime dieTime= QTime::currentTime().addSecs(m_seconds);
std::thread child(&SystemWithTimeout::startChild, this);
child.detach();
while (!m_childFinished && QTime::currentTime() < dieTime)
{
QTime then = QTime::currentTime();
QCoreApplication::processEvents(QEventLoop::AllEvents, m_period); // Process events during up to m_period ms (default: 200ms)
QTime now = QTime::currentTime();
int waitTime = m_period-(then.msecsTo(now)) ;
QThread::msleep(waitTime); // wait for the remaning of the 200 ms before looping again.
}
if (!m_childFinished)
{
SYSLOG(LOG_NOTICE, "Killing command <%s> after timeout reached (%d seconds)", m_childCommand.toUtf8().data(), m_seconds);
int rc = system(m_cleanupCmd.toUtf8().data());
if (rc != 0) SYSLOG(LOG_NOTICE, "Error SystemWithTimeout 164: system returned %d", rc);
m_childFinished = true ;
}
}
I do not know any portable way to do that in C nor C++ languages. As you ask for alternatives, I know it is possible in other languages. For example in Python, it is possible using the subprocess module.
import subprocess
cmd = subprocess.Popen("COMMAND", shell = True)
You can then test if COMMAND has ended with
if cmd.poll() is not None:
# cmd has finished
and you can kill it with :
cmd.terminate()
Even if you prefere to use C language, you should read the documentation for subprocess module because it explains that internally it uses CreateProcess on Windows and os.execvp on Posix systems to start the command, and it uses TerminateProcess on Windows and SIG_TERM on Posix to stop it.
I have the helper function below, used to execute a command and get the return value on posix systems. I used to use popen, but it is impossible to get the return code of an application with popen if it runs and exits before popen/pclose gets a chance to do its work.
The following helper function creates a process fork, uses execvp to run the desired external process, and then the parent uses waitpid to get the return code. I'm seeing odd cases where it's refusing to run.
When called with wait = true, waitpid should return the exit code of the application no matter what. However, I'm seeing stdout output that specifies the return code should be non-zero, yet the return code is zero. Testing the external process in a regular shell, then echoing $? returns non-zero, so it's not a problem w/ the external process not returning the right code. If it's of any help, the external process being run is mount(8) (yes, I know I can use mount(2) but that's besides the point).
I apologize in advance for a code dump. Most of it is debugging/logging:
inline int ForkAndRun(const std::string &command, const std::vector<std::string> &args, bool wait = false, std::string *output = NULL)
{
std::string debug;
std::vector<char*> argv;
for(size_t i = 0; i < args.size(); ++i)
{
argv.push_back(const_cast<char*>(args[i].c_str()));
debug += "\"";
debug += args[i];
debug += "\" ";
}
argv.push_back((char*)NULL);
neosmart::logger.Debug("Executing %s", debug.c_str());
int pipefd[2];
if (pipe(pipefd) != 0)
{
neosmart::logger.Error("Failed to create pipe descriptor when trying to launch %s", debug.c_str());
return EXIT_FAILURE;
}
pid_t pid = fork();
if (pid == 0)
{
close(pipefd[STDIN_FILENO]); //child isn't going to be reading
dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd
dup2(pipefd[STDOUT_FILENO], STDERR_FILENO);
if (execvp(command.c_str(), &argv[0]) != 0)
{
exit(EXIT_FAILURE);
}
return 0;
}
else if (pid < 0)
{
neosmart::logger.Error("Failed to fork when trying to launch %s", debug.c_str());
return EXIT_FAILURE;
}
else
{
close(pipefd[STDOUT_FILENO]);
int exitCode = 0;
if (wait)
{
waitpid(pid, &exitCode, wait ? __WALL : (WNOHANG | WUNTRACED));
std::string result;
char buffer[128];
ssize_t bytesRead;
while ((bytesRead = read(pipefd[STDIN_FILENO], buffer, sizeof(buffer)-1)) != 0)
{
buffer[bytesRead] = '\0';
result += buffer;
}
if (wait)
{
if ((WIFEXITED(exitCode)) == 0)
{
neosmart::logger.Error("Failed to run command %s", debug.c_str());
neosmart::logger.Info("Output:\n%s", result.c_str());
}
else
{
neosmart::logger.Debug("Output:\n%s", result.c_str());
exitCode = WEXITSTATUS(exitCode);
if (exitCode != 0)
{
neosmart::logger.Info("Return code %d", (exitCode));
}
}
}
if (output)
{
result.swap(*output);
}
}
close(pipefd[STDIN_FILENO]);
return exitCode;
}
}
Note that the command is run OK with the correct parameters, the function proceeds without any problems, and WIFEXITED returns TRUE. However, WEXITSTATUS returns 0, when it should be returning something else.
Probably isn't your main issue, but I think I see a small problem. In your child process, you have...
dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd
dup2(pipefd[STDOUT_FILENO], STDERR_FILENO); //but wait, this pipe is closed!
But I think what you want is:
dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
dup2(pipefd[STDOUT_FILENO], STDERR_FILENO);
close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd for both, can close
I don't have much experience with forks and pipes in Linux, but I did write a similar function pretty recently. You can take a look at the code to compare, if you'd like. I know that my function works.
execAndRedirect.cpp
I'm using the mongoose library, and grepping my code for SIGCHLD revealed that using mg_start from mongoose results in setting SIGCHLD to SIG_IGN.
From the waitpid man page, on Linux a SIGCHLD set to SIG_IGN will not create a zombie process, so waitpid will fail if the process has already successfully run and exited - but will run OK if it hasn't yet. This was the cause of the sporadic failure of my code.
Simply re-setting SIGCHLD after calling mg_start to a void function that does absolutely nothing was enough to keep the zombie records from being immediately erased.
Per #Geoff_Montee's advice, there was a bug in my redirect of STDERR, but this was not responsible for the problem as execvp does not store the return value in STDERR or even STDOUT, but rather in the kernel object associated with the parent process (the zombie record).
#jilles' warning about non-contiguity of vector in C++ does not apply for C++03 and up (only valid for C++98, though in practice, most C++98 compilers did use contiguous storage, anyway) and was not related to this issue. However, the advice on reading from the pipe before blocking and checking the output of waitpid is spot-on.
I've found that pclose does NOT block and wait for the process to end, contrary to the documentation (this is on CentOS 6). I've found that I need to call pclose and then call waitpid(pid,&status,0); to get the true return value.