Are thread and process ids unique? - c++

I am using a static library; it has a function which uses the current time and creates a unique id, which is then inserted into my database. This number should be unique in my database table.
There are two processes running in parallel. Sometimes they simultaneously call this function, and the same number is generated. I get an integrity violation when this happens.
I am thinking to use the process id, the thread id, and the current time. Is this combination unique?
Platform: Windows XP

Use the database to generate them. How to do that depends on the database, but Postgres calls them sequences for an example.

The process/thread id will be unique if the programs are running simultaneously as the OS needs to differentiate them. But the system does reuse ids.
So, for your situation, yes, its a good idea to add either process id or thread id into your marker, tho I don't think you'd need both.

On Windows, thread Ids are unique throughout the system. See this MSDN library article:
http://msdn.microsoft.com/en-us/library/ms686746%28v=VS.85%29.aspx
The CreateThread and CreateRemoteThread functions also return an identifier that uniquely identifies the thread throughout the system. A thread can use the GetCurrentThreadId function to get its own thread identifier. The identifiers are valid from the time the thread is created until the thread has been terminated. Note that no thread identifier will ever be 0.

The combination of process ID, thread ID and time is unfortunately not guaranteed to be unique. The OS may reuse process IDs and thread IDs once the threads and processes they referred to have terminated. Also, the user may set the clock back, so the same time occurs twice. As others have said, I'd ask the database for a unique ID. Oracle has sequences, MySQL has auto-increment columns, other databases have similar mechanisms.

Whilst the process id and thread id will be unique it would be better to use the database to generate the unique id for you (as R. Pate suggests) if only because you're potentially limiting your scalability unless you also include a unique machine id as well...
Though it's probably reasonably unlikely that one of your processes running on machine A will have the same process id and thread id as one of your processes running on machine B those are always the kinds of bugs that end up getting people out of bed at 4am to deal with the support call...

Well, adding process id and thread id could possibly lead to the same number
pid= 100, tid= 104
pid= 108, tid= 96
Not quite likely but possible.
So for near safe ids, you'll need at least a 64 bit ID field like
ULONG64 id = ((ULONG64)(pid&0xffff) << 48) | ((ULONG64)(tid&0xffff) << 32) | (timestamp & 0xffffffff);
(however, this still does not guarantee uniqueness as it assumes that thread ids don't overlap in a way with process ids that they neutralize 16-bit values, but I don't think I ever saw PIDs over 65536 and unless you are creating thousands of threads, the thread IDs will not wrap around in this value before the timestamp jumps).

Related

What is the difference between SQLITE_THREADSAFE = 1 vs = 2 and why don't they allow sharing a same connection in multiple threads?

From SQLite compile time options:
... SQLITE_THREADSAFE=1 sets the default threading mode to Serialized. SQLITE_THREADSAFE=2 sets the default threading mode to Multi-threaded ...
It further states:
Multi-thread. In this mode, SQLite can be safely used by multiple
threads provided that no single database connection is used
simultaneously in two or more threads.
Serialized. In serialized mode, SQLite can be safely used by multiple
threads with no restriction.
It's not clear what is the use of "Multi-thread" (=2), if "Serialized" (=1) is capable of doing it without restrictions. The literal meanings of these 2 quoted terms are also not very clear.
Is the single DB connection in multiple threads not allowed for =2 option or =1 as well? Is that an undefined behaviour if used?
The reason for the second question is that, I have a requirement where several DB files are opened at the same time. They are being read in worker thread and written in a single DB thread. If I create 2 connections for each DB file, then soon the file descriptor limit can get exhausted for an OS.
Though we haven't faced any major problem, recently we came across a situation where the SQLite was accessed simultaneously from both the worker and DB threads. A long delay of 20 sec blocked the worker thread. This issue reproduces consistently.
This lead me to believe that, threading could be an issue. In my setup, the default =1 (Serialized) option is set at the compile time.
Clarifications:
Environment: Using Qt/C++. For threading we use QThreads. IMO, this may not impact this behaviour
Threading: There is a main thread, "database" thread and 4 worker threads. Every user sits on a particular worker thread for its socket connection. However their DBs are always on the common "database" thread
DB connections: There can be hundreds of different DBs opened at a time depending on number of users connected to server. Since every OS has a limit of how many files can be opened at a time, I use 1 connection per DB file.
Connection sharing: Every user's DB connection is shared between its worker thread for reading (SELECT) and the common DB thread for writing (INSERT/DELETE/UPDATE). I assumed that for =1, the connection can probably be shared.
Suspicion: There is 1 table which has 10k+ rows and it also contains huge data in its columns. Total DB size goes upto 300-400 MBs mainly due to this. When a SELECT is invoked on this particular row based on its "id" field (30 character string). The first time, it takes upto 20 sec. The next time, it's few milliseconds
Don't remove the C++ tag.
Well I'm no SQLite expert, but the following sounds quite clear to me:
Multi-thread. In this mode, SQLite can be safely used by multiple threads provided that no single database connection is used simultaneously in two or more threads.
Serialized. In serialized mode, SQLite can be safely used by multiple threads with no restriction.
From my understanding this means:
If you use SQLITE_THREADSAFE=2 (multithreaded) you have to make sure that each thread uses its own database connection. Sharing a single database connecgtion amongst multiple threads isn't safe.
If you use SQLITE_THREADSAFE=1 (serialized) you can even safely reuse a single databse connection amongst multiple threads.

Database access with threading

I'm developing a program (using C++ running on a Linux machine) that uses SQLite as a back-end.
It has 2 threads which carry out the following tasks:
Thread 1
Waits for a piece of data to arrive (in this case, via a radio module)
Immediately inserts it into the database
Returns to waiting for new data
It is important this thread is "listening" for as much of the time as possible and isn't blocked waiting to insert into the database
Thread 2
Every 2 minutes, runs a SELECT on the database to find un-processed data
Processes the data
UPDATEs the rows fetched with a flag to show they have been processed
The key thing is to make sure that Thread 1 can always INSERT into the database, even if this means that Thread 2 is unable to SELECT or UPDATE (as this can just take place at a future point, the timing isn't critical).
I was hoping to find a way to prioritise INSERTs somehow using SQLite, but have failed to find a way so far. Another thought was for Thread 1 to push it's the data into a basic queue (held in memory) and then bulk INSERT it every so often (as this wouldn't be blocking the receiving of data and could do a simple check to see if the database was locked, if so, wait a few milliseconds and try again).
However, what is the "proper" way to do this with SQLite and C++ threads?
SQlite database can be opened with or without multi-threading support. Both threads should open the database separately.
If you want to do the hard way, you can use a priority queue and process the queries.

Safe way to cache PID to Port Mapping Windows

I'm using WinDivert to pipe connections (TCP and UDP) through a transparent proxy on Windows. How this works is by doing a port-to-pid lookup using functions like GETTcpTable2, then checking to see if the PID matches or does not match the PID of the proxy or any of it's child processes. If they don't match, they get forwarded through the proxy, if they do, the packets are untouched.
My question is, is there a safe way, or a safe duration, that I can "cache" the results of that port-to-pid lookup? Whenever I get a lot of packets flowing through, say watching a video on youtube, the code using WinDivert suddenly chomps all of my CPU up, and I'm assuming this is from making a TcpTable2 lookup on every packet received. I can see with UDP there not really being a safe duration that I can assume it's the same process bound to a port, but is this possible with TCP?
As a complement to Luis comment, I think that the application that caches the port to pid lookup could also keep a handle to the processes (just get it through OpenProcess). The problem, if that resources associated to a process are not freed until all handles to it are closed. That is normal, because until you have a valid handle to a process, you can query the system for various informations such as used memory or times. So you should periodically look whether the cached processes are terminated to purge the entry from cache and close the handle.
As an alternative, you could just keep another information such as the starting time of a process, that is accessible through GetProcessTimes. When looking in the cache to find a process id, you open the process and controls its start time. If ok, it is the right process, if not, the process id has been reused and you should purge the entry from cache.
The first way should be more efficient because you do not have to re-open the process for each packet, but you have to be more strict for identifying terminated processes to release resources, maybe with a thread that would use WaitForMultipleObjectsEx on all process handles to be alerted as soon as one is terminated.
The second way should be simpler to implement.
So, all I ended up doing here was using two std::unordered_maps. One map was to store the port number (as a key) and the last system time in milliseconds that the TCPTable was queried to find the process ID that was bound to the port (the key). If the key didn't exist or the last time was greater than the current system time plus 2 seconds, then a fresh query the to TCPTable is needed to re-check the PID bound to the port. After we've done that check, we update the second map which uses the port # as the key and returns an int that represents the PID found using the port in question on the last query. Gives us a 2 second cache on lookups which dropped peak CPU usage from well over 50% down to a max of 3%.

Race condition with Performance Counters for current process

I am trying to get around the old "How do I get a Windows Performance Counter for the current process" issue. Basically I am enumerating Process Object instances to get a list of Process objects that I can then query for their process id and compare to my own.
Based on this I can build a performance counter path using the correct instance index (to create something similar to \Process(my_program#3)\<counter>) that I can then use to query whatever counter it is that I am interested in. But what happens if one or more of the other instances of my_program exit prior to the PdhAddCounter call? If I understand correctly, this would mean that my counter path now refers to a different process or is now invalid. They might even disappear while querying for the process id...
How do I prevent the counter path from becoming invalid before I can use it to get a counter handle?
Wow, you are right. This seems like a major design flaw to me. Basically it is impossible to reliably monitor an instance if it's name is not unique. I did stumble across a workaround specifically for the Process and Thread objects, but that's a global setting that could affect other applications.
I think the safest way to do this would be to watch all process objects, and each time you collect the data go through and find the one with the desired Process Id.

What is an easy way to test whether any process of a given id is presently running on Linux?

In C++, I have a resource that is tied to a pid. Sometimes the process associated with that pid exits abnormally and leaks the resource.
Therefore, I'm thinking of putting the pid in the file that records the resource as being in use. Then when I go to get a resource, if I see an item as registered as being in use, I would search to see whether a process matching the pid is currently running, and if not, clean up the leaked resource.
I realize there is a very small probability that a new unrealated pid is now sharing the same number, but this is better than leaking with no clean up I have now.
Alternatively, perhaps there is a better solution for this, if so, please suggest, otherwise, I'll pursue the pid recording.
Further details: The resource is a port number for communication between a client and a server over tcp. Only one instance of the client may use a given port number on a machine. The port numbers are taken from a range of available port numbers to use. While the client is running, it notes the port number it is using in a special file on disk and then cleans this entry up on exit. For abnormal exit, this does not always get cleaned up and the port number is left annotated as being in use, when it is no longer being used.
To check for existence of process with a given id, use kill(pid,0) (I assume you are on POSIX system). See man 2 kill for details.
Also, you can use waitpid call to be notified when the process finishes.
I would recommend you use some kind of OS resource, not a PID. Mutexes, semaphores, delete-on-close files. All of these are cleaned up by the OS when a process exits.
On Windows, I would recommend a named mutex.
On Linux, I would recommend using flock on a file.
How about a master process that starts your process (the one which terminates abnormally) waits for your process to crash (waitpid) and spawns it again when waitpid returns.
while(1) {
fork exec
waitpid
}
The problem domain isn't clear, unfortunately, you could try re-explaining it in some other way.
But if I understand you correctly, you could create a map like
std::map< ProcessId, boost::shared_ptr<Resource> > map;
// `Resource` here references to some abstract resource type
// and `ProcessId` on Windows system would be basically a DWORD
and in this case you simply have to list every running process (this can be done via EnumProcesses call on Windows) and remove every entry with inappropriate id from your map. After doing this you would have only valid process-resource pairs left. This action can be repeated every YY seconds depending on your needs.
Note that in this case removing an item from your map would basically call the corresponding destructor (because, if your resource is not being used in your code somewhere else, it's reference count would drop to zero).
The API that achieves that on windows are OpenProcess which takes process ID as input, and GetExitCodeProcess which returns STILL_ACTIVE when the process is, well, still active. You could also use any Wait function with zero timeout, but this API seems somewhat cleaner.
As other answers note, however, this doesn't seem a promising road to take. We might be able to give more focused advice if you provide more scenario details. What is your platform? What is the leaked resource exactly? Do you have access to the leaking app code? Can you wrap it in a high-level try-catch with some cleanup? If not, maybe wait on the leaker to finish with a dedicated thread (or dedicated process altogether)? Any detail you provide might help.