Race condition with Performance Counters for current process

Race condition with Performance Counters for current process - c++

I am trying to get around the old "How do I get a Windows Performance Counter for the current process" issue. Basically I am enumerating Process Object instances to get a list of Process objects that I can then query for their process id and compare to my own.
Based on this I can build a performance counter path using the correct instance index (to create something similar to \Process(my_program#3)\<counter>) that I can then use to query whatever counter it is that I am interested in. But what happens if one or more of the other instances of my_program exit prior to the PdhAddCounter call? If I understand correctly, this would mean that my counter path now refers to a different process or is now invalid. They might even disappear while querying for the process id...
How do I prevent the counter path from becoming invalid before I can use it to get a counter handle?

Wow, you are right. This seems like a major design flaw to me. Basically it is impossible to reliably monitor an instance if it's name is not unique. I did stumble across a workaround specifically for the Process and Thread objects, but that's a global setting that could affect other applications.
I think the safest way to do this would be to watch all process objects, and each time you collect the data go through and find the one with the desired Process Id.

Related

Notifying a task from multiple other tasks without extra work

My application is futures-based with async/await, and has the following structure within one of its components:
a "manager", which is responsible for starting/stopping/restarting "workers", based both on external input and on the current state of "workers";
a dynamic set of "workers", which perform some continuous work, but may fail or be stopped externally.
A worker is just a spawned task which does some I/O work. Internally it is a loop which is intended to be infinite, but it may exit early due to errors or other reasons, and in this case the worker must be restarted from scratch by the manager.
The manager is implemented as a loop which awaits on several channels, including one returned by async_std::stream::interval, which essentially makes the manager into a poller - and indeed, I need this because I do need to poll some Mutex-protected external state. Based on this state, the manager, among everything else, creates or destroys its workers.
Additionally, the manager stores a set of async_std::task::JoinHandles representing live workers, and it uses these handles to check whether any workers has exited, restarting them if so. (BTW, I do this currently using select(handle, future::ready()), which is totally suboptimal because it relies on the select implementation detail, specifically that it polls the left future first. I couldn't find a better way of doing it; something like race() would make more sense, but race() consumes both futures, which won't work for me because I don't want to lose the JoinHandle if it is not ready. This is a matter for another question, though.)
You can see that in this design workers can only be restarted when the next poll "tick" in the manager occurs. However, I don't want to use a too small interval for polling, because in most cases polling just wastes CPU cycles. Large intervals, however, can delay restarting a failed/canceled worker by too much, leading to undesired latencies. Therefore, I though I'd set up another channel of ()s back from each worker to the manager, which I'd add to the main manager loop, so when a worker stops due to an error or otherwise, it will first send a message to its channel, resulting in the manager being woken up earlier than the next poll in order to restart the worker right away.
Unfortunately, with any kinds of channels this might result in more polls than needed, in case two or more workers stop at approximately the same time (which due to the nature of my application, is somewhat likely to happen). In such case it would make sense to only run the manager loop once, handling all of the stopped workers, but with channels it will necessarily result in the number of polls equal to the number of stopped workers, even if additional polls don't do anything.
Therefore, my question is: how do I notify the manager from its workers that they are finished, without resulting in extra polls in the manager? I've tried the following things:
As explained above, regular unbounded channels just won't work.
I thought that maybe bounded channels could work - if I used a channel with capacity 0, and there was a way to try and send a message into it but just drop the message if the channel is full (like the offer() method on Java's BlockingQueue), this seemingly would solve the problem. Unfortunately, the channels API, while providing such a method (try_send() seems to be like it), also has this property of having capacity larger than or equal to the number of senders, which means it can't really be used for such notifications.
Some kind of atomic or a mutex-protected boolean flag also look as if it could work, but there is no atomic or mutex API which would provide a future to wait on, and would also require polling.
Restructure the manager implementation to include JoinHandles into the main select somehow. It might do the trick, but it would result in large refactoring which I'm unwilling to make at this point. If there is a way to do what I want without this refactoring, I'd like to use that first.
I guess some kind of combination of atomics and channels might work, something like setting an atomic flag and sending a message, and then skipping any extra notifications in the manager based on the flag (which is flipped back to off after processing one notification), but this also seems like a complex approach, and I wonder if anything simpler is possible.

I recommend using the FuturesUnordered type from the futures crate. This collection allows you to push many futures of the same type into a collection and wait for any one of them to complete at once.
It implements Stream, so if you import StreamExt, you can use unordered.next() to obtain a future that completes once any future in the collection completes.
If you also need to wait for a timeout or mutex etc., you can use select to create a future that completes once either the timeout or one of the join handles completes. The future returned by next() implements Unpin, so it is usable with select without problems.

Avoiding sqlite3 database locked

I have a multithreaded application that uses sqlite (3.7.3)
I'm hitting the database locked error that seems to be quite prevalent.
I'm wondering how to avoid it in my case.
Let me describe what I'm building. Sorry, no code it's too large and complex.
I have around 8 threads that simultaneously access the database. Any one of those threads can either read or write at the same time.
Each row in a table in the database has a file path that points to a resource + other attributes related to that resource.
3 fields of note are readers, status and del.
Readers is incremented each time a thread reads from the resource, but only if status > 0 and del = 0.
So I have some SQL that does
UPDATE resource set readers=readers+1 where id=? AND del=0 AND status>0
After that, I check the number of rows updated. It should only be 1.
After that I try to read the row back with a select. I do that even if it failed
to update because I need to know the reason that it failed.
I tried wrapping both the update and the select in in a transaction but that didn't help.
I've checked that I'm calling finalize on my statements too.
Now, I thought that sqlite serializes by default. I've tried a couple of open modes but I still get the same error.
And before you ask, no I'm not intending to go to mysql. I absolutely need zero config.
Can someone provide some pointers on how to avoid this type of problem? should I move the readers lock out of the DB? If I do that what mechanism should I replace it with? I'm using Linux under C++ and with the boost library available.
EDIT:
Interestingly, adding COMMIT after my updated call improved things dramatically.

When you open the db, you should configure the 'busy timeout'
int sqlite3_busy_timeout(sqlite3*, int ms);
http://www.sqlite.org/c3ref/busy_timeout.html

First question: are you trying to use one connection with all eight threads? If so, make sure each thread has their own connection. I don't know of any database that likes that.
Also check out the FAQ: http://www.sqlite.org/faq.html
Apparently SQLite has to be compiled with a SQLITE_THREADSAFE preprocessor option set to 1. They do have a method to determine if that's your problem.
Another issue is that writes can only happen from one process safely.

What is an easy way to test whether any process of a given id is presently running on Linux?

In C++, I have a resource that is tied to a pid. Sometimes the process associated with that pid exits abnormally and leaks the resource.
Therefore, I'm thinking of putting the pid in the file that records the resource as being in use. Then when I go to get a resource, if I see an item as registered as being in use, I would search to see whether a process matching the pid is currently running, and if not, clean up the leaked resource.
I realize there is a very small probability that a new unrealated pid is now sharing the same number, but this is better than leaking with no clean up I have now.
Alternatively, perhaps there is a better solution for this, if so, please suggest, otherwise, I'll pursue the pid recording.
Further details: The resource is a port number for communication between a client and a server over tcp. Only one instance of the client may use a given port number on a machine. The port numbers are taken from a range of available port numbers to use. While the client is running, it notes the port number it is using in a special file on disk and then cleans this entry up on exit. For abnormal exit, this does not always get cleaned up and the port number is left annotated as being in use, when it is no longer being used.

To check for existence of process with a given id, use kill(pid,0) (I assume you are on POSIX system). See man 2 kill for details.
Also, you can use waitpid call to be notified when the process finishes.

I would recommend you use some kind of OS resource, not a PID. Mutexes, semaphores, delete-on-close files. All of these are cleaned up by the OS when a process exits.
On Windows, I would recommend a named mutex.
On Linux, I would recommend using flock on a file.

How about a master process that starts your process (the one which terminates abnormally) waits for your process to crash (waitpid) and spawns it again when waitpid returns.
while(1) {
fork exec
waitpid
}

The problem domain isn't clear, unfortunately, you could try re-explaining it in some other way.
But if I understand you correctly, you could create a map like
std::map< ProcessId, boost::shared_ptr<Resource> > map;
// `Resource` here references to some abstract resource type
// and `ProcessId` on Windows system would be basically a DWORD
and in this case you simply have to list every running process (this can be done via EnumProcesses call on Windows) and remove every entry with inappropriate id from your map. After doing this you would have only valid process-resource pairs left. This action can be repeated every YY seconds depending on your needs.
Note that in this case removing an item from your map would basically call the corresponding destructor (because, if your resource is not being used in your code somewhere else, it's reference count would drop to zero).

The API that achieves that on windows are OpenProcess which takes process ID as input, and GetExitCodeProcess which returns STILL_ACTIVE when the process is, well, still active. You could also use any Wait function with zero timeout, but this API seems somewhat cleaner.
As other answers note, however, this doesn't seem a promising road to take. We might be able to give more focused advice if you provide more scenario details. What is your platform? What is the leaked resource exactly? Do you have access to the leaking app code? Can you wrap it in a high-level try-catch with some cleanup? If not, maybe wait on the leaker to finish with a dedicated thread (or dedicated process altogether)? Any detail you provide might help.

How to restrict proccess to create new processes?

How to restrict proccess to create new processes?

You could assign the process to a job object. Use SetInformationJobObject with the JOB_OBJECT_LIMIT_ACTIVE_PROCESS flag to limit the number of processes in that job object to one. Do NOT set the JOB_OBJECT_LIMIT_BREAKAWAY_OK (which would allow the process to create processes that were not part of the job object).
The process could still work around that, such as by starting a new process via the task scheduler or WMI. If you're trying to do something like create a sandbox to run code you really don't trust, this won't adequate. If you have a program that you trust, but just want to place a few limits on what it does, this should be more than adequate.
To put that slightly differently, this is equivalent to locking your car. Somebody can break in (or out, in this case), but at least they have to do a bit more than just walk in unhindered.

On Windows, there isn't a way to stop a processing from spawning other processes. Nor is there on any operating system I know of.
The CreateProcess() system call is available to all processes, thus any process can create a child process.
You could run the process in a sandbox which restricts process creation, but the overhead for this is probably more than you want.
Can I ask why you want to do such a thing?

Use NT Job objects
JOBOBJECT_BASIC_LIMIT_INFORMATION can limit the number of active processes, or use JOBOBJECT_ASSOCIATE_COMPLETION_PORT and kill the new process (If you only need to kill a subset of all new processes)

getrusage() get system time, user time. Unix programming help

I am writing a shell where I need to launch several child processes at once and record the system time and user time.
So far I am able to do it. The only problem is that I am using wait4 to grab the system resources used by the child program and put it in my rusage structure called usage.
How can I launch all the processes at the same time and keep track of the user and system times? I can remove the wait4() system call and use it outside to loop so I can make the parent wait, but if I do that then I can only record the times for the last process and not all of them.
Do you have any idea how I can fix this?
execute(commandPipev,"STANDARD",0);
wait4(pid,&status,0,&usage);
printf("Child process: %s\t PID:%d\n", commandPipev[0], pid);
printf("System time: %ld.%06ld sec\n",usage.ru_stime.tv_sec, usage.ru_stime.tv_usec);
printf("User time: %ld.%06ld sec\n\n",usage.ru_utime.tv_sec, usage.ru_utime.tv_usec);

A convoluted answer.
In a POSIX environment, launch the children, then use waitid() with the WNOWAIT option to tell you that some child has exited. The option leaves the child in a waitable state - that is, you can use another wait-family call to garner the information you need. You can then use the non-POSIX wait4() system call to garner the usage information for the just exited child, and deal with the accounting you need to do. Note that you might find a different process has terminated between the waitid() and wait4() calls; you need to use a loop and appropriate flags and tests to collect all the available corpses (dead child processes) before going back to the waitid() call to find out about the other previously incomplete child processes. You also have to worry about any of the wait-family of functions returning the information for a process that was previously started in the background and has now finished.
The Linux man page for wait4(2) suggests that WNOWAIT might work directly with wait4(2), so you may be able to do it all more cleanly - if, indeed, you need the option at all.
Consider whether you can use process groups to group the child processes together, to make waiting for the members of the process group easier.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js