Word-Net Thread safety - c++

I am using Word-Net in a C++ project (although the library is in C). In specific, I am calling only two functions:
findtheinfo_ds
traceptrs_ds
Now, if I understand correctly the underlying structure (its quite old as it was written in the late nineties I think), the library uses files as the database from where it retrieves the buffer results I get.
However, I am not sure about the thread safety of the library.
My current algorithm is:
SynsetPtr syn = findtheinfo_ds( query , NOUN, HYPERPTR, ALLSENSES );
if ( syn )
{
// Iterate all senses
while ( syn )
{
for ( int i = 0; i < syn->wcount; i++ )
std::cout << "synonym: " << syn->words[i] << std::endl;
int i = 0;
SynsetPtr ptr = traceptrs_ds( syn, HYPERPTR, NOUN, 1 );
while ( ptr )
{
for ( int x = 0; x <= i; x++ )
std::cout << "\t";
for ( int i = 0; i < ptr->wcount; i++ )
std::cout << ptr->words[i] << ", ";
std::cout << std::endl;
i++;
auto old_ptr = ptr;
ptr = traceptrs_ds( ptr, HYPERPTR, NOUN, 1 );
free_syns( old_ptr );
}
free_syns( ptr );
syn = syn->nextss;
}
free_syns( syn );
}
}
However, I want to run parallel threads, searching for different words at the same time.
I understand that most UNIX/Linux distributions of today have thread-safe file system calls.
Furthermore, I intend to access to the above loop, per one thread only.
What I am worried about, is that before this loop above, a
wninit();
call has to take place, which makes me assume that in the library, a singleton is somewhere initialized. I cannot take a peek at the code as it is closed-source, and I do not have access to that singleton, as winit() only returns an int for success.
Is there any way to either:
Ensure thread-safety in this scenario, or
Find out (through any possible way), if the library is thread safe?
It is loaded dynamically, from a Debian package called wordnet-base, which installs libwordnet-3.0.so
Many thanks to anyone who can help!

Well, the only way to ensure that a library is really thread-safe, is to analyze its code. Or simply ask its author and then trust hisr/her answer:). Usually data stored on disk isn't the cause of thread unsafety but there's a lot of places where code may break in a multi-threaded environment. One has to check for global variables, existance of variables declared static inside library functions etc.
There's however a solution which could be used if you don't have time and/or intent to study the code. You may use a multiprocess technique when parallel tasks are performed in worker processes, not worker threads, and there's a director process which prepares job units for workers and collects results. Depending on the task such workers may be implemented as FastCGI, or communicate with parent using Boost.Interprocess

Related

Shared Lua state between pthread seg-fault if not executing coroutine

first of all I know my question look familiar but I am actually not asking why a seg-fault occurs when sharing a lua state between different pthread. I am actually asking why they don't seg-fault in a specific case described below.
I tried to organize it as well as I could but I realize it is very long. Sorry about that.
A bit of background:
I am writing a program which is using the Lua interpreter as a base for the user to execute instructions and using the ROOT libraries (https://root.cern.ch/) to display graphs, histograms, etc...
All of this is working just fine but then I tried to implement a way for the user to start a background task while keeping the ability to input commands in the Lua prompt, to be able to do something else entirely while the task finishes, or to request to stop it for instance.
My first attempt was the following:
First on the Lua side I load some helper functions and initialize global variables
-- Lua script
RootTasks = {}
NextTaskToStart = nil
function SetupNewTask(taskname, fn, ...)
local task = function(...)
local rets = table.pack(fn(...))
RootTasks[taskname].status = "done"
return table.unpack(rets)
end
RootTasks[taskname] = {
task = SetupNewTask_C(task, ...),
status = "waiting",
}
NextTaskToStart = taskname
end
Then on the C side
// inside the C++ script
int SetupNewTask_C ( lua_State* L )
{
// just a function to check if the argument is valid
if ( !CheckLuaArgs ( L, 1, true, "SetupNewTask_C", LUA_TFUNCTION ) ) return 0;
int nvals = lua_gettop ( L );
lua_newtable ( L );
for ( int i = 0; i < nvals; i++ )
{
lua_pushvalue ( L, 1 );
lua_remove ( L, 1 );
lua_seti ( L, -2, i+1 );
}
return 1;
}
Basically the user provide the function to execute followed by the parameters to pass and it just pushes a table with the function to execute as the first field and the arguments as subsequent fields. This table is pushed on top of the stack, I retrieve it and store it a global variable.
The next step is on the Lua side
-- Lua script
function StartNewTask(taskname, fn, ...)
SetupNewTask(taskname, fn, ...)
StartNewTask_C()
RootTasks[taskname].status = "running"
end
and on the C side
// In the C++ script
// lua, called below, is a pointer to the lua_State
// created when starting the Lua interpreter
void* NewTaskFn ( void* arg )
{
// helper function to get global fields from
// strings like "something.field.subfield"
// Retrieve the name of the task to be started (has been pushed as
// a global variable by previous call to SetupNewTask_C)
TryGetGlobalField ( lua, "NextTaskToStart" );
if ( lua_type ( lua, -1 ) != LUA_TSTRING )
{
cerr << "Next task to schedule is undetermined..." << endl;
return nullptr;
}
string nextTask = lua_tostring ( lua, -1 );
lua_pop ( lua, 1 );
// Now we get the actual table with the function to execute
// and the arguments
TryGetGlobalField ( lua, ( string ) ( "RootTasks."+nextTask ) );
if ( lua_type ( lua, -1 ) != LUA_TTABLE )
{
cerr << "This task does not exists or has an invalid format..." << endl;
return nullptr;
}
// The field "task" from the previous table contains the
// function and arguments
lua_getfield ( lua, -1, "task" );
if ( lua_type ( lua, -1 ) != LUA_TTABLE )
{
cerr << "This task has an invalid format..." << endl;
return nullptr;
}
lua_remove ( lua, -2 );
int taskStackPos = lua_gettop ( lua );
// The first element of the table we retrieved is the function so the
// number of arguments for that function is the table length - 1
int nargs = lua_rawlen ( lua, -1 ) - 1;
// That will be the function
lua_geti ( lua, taskStackPos, 1 );
// And the arguments...
for ( int i = 0; i < nargs; i++ )
{
lua_geti ( lua, taskStackPos, i+2 );
}
lua_remove ( lua, taskStackPos );
// I just reset the global variable NextTaskToStart as we are
// about to start the scheduled one.
lua_pushnil ( lua );
TrySetGlobalField ( lua, "NextTaskToStart" );
// Let's go!
lua_pcall ( lua, nargs, LUA_MULTRET, 0 );
}
int StartNewTask_C ( lua_State* L )
{
pthread_t newTask;
pthread_create ( &newTask, nullptr, NewTaskFn, nullptr );
return 0;
}
So for instance a call in the Lua interpreter to
> StartNewTask("PeriodicPrint", function(str) for i=1,10 print(str);
>> sleep(1); end end, "Hello")
Will produce for the next 10 seconds a print of "Hello" every second. It will then return from execution and everything is wonderful.
Now if I ever hit ENTER key while that task is running, the program dies in horrible seg-fault sufferings (which I don't copy here as each time it seg-fault the error log is different, sometimes there is no error at all).
So I read a bit online what could be the matter and I found several mention that the lua_State are not thread safe. I don't really understand why just hitting ENTER will make it flip out but that's not really the point here.
I discovered by accident that this approach could work without any seg-faulting with a tiny modification. Instead of running the function directly, if a coroutine is executed, everything I wrote above works just fine.
replace the previous Lua side function SetupNewTask with
function SetupNewTask(taskname, fn, ...)
local task = coroutine.create( function(...)
local rets = table.pack(fn(...))
RootTasks[taskname].status = "done"
return table.unpack(rets)
end)
local taskfn = function(...)
coroutine.resume(task, ...)
end
RootTasks[taskname] = {
task = SetupNewTask_C(taskfn, ...),
routine = task,
status = "waiting",
}
NextTaskToStart = taskname
end
I can execute several tasks at once for extended period of time without getting any seg-faults. So we finally come to my question:
Why using coroutine works? What is the fundamental difference in this case? I just call coroutine.resume and I do not do any yield (or anything else for what matters). Then just wait for the coroutine to be done and that's it.
Are coroutine doing something I do not suspect?
That it seems as if nothing broke doesn't mean that it actually works, so…
What's in a lua_State?
(This is what a coroutine is.)
A lua_State stores this coroutine's state – most importantly its stack, CallInfo list, a pointer to the global_State, and a bunch of other stuff.
If you hit return in the REPL of the standard Lua interpreter, the interpreter tries to run the code you typed. (An empty line is also a program.) This involves putting it on the Lua stack, calling some functions, etc. etc. If you have code running in a different OS thread that is also using the same Lua stack/state… well, I think it's clear why this breaks, right? (One part of the problem is caching of stuff that "doesn't"/shouldn't change (but changes because another thread is also messing with it). Both threads are pushing/popping stuff on the same stack and step on each other's feet. If you want to dig through the code, luaV_execute may be a good starting point.)
So now you're using two different coroutines, and all the obvious sources of problems are gone. Now it works, right…? Nope, because coroutines share state,
The global_State!
This is where the "registry", string cache, and all the things related to garbage collection live. And while you got rid of the main "high-frequency" source of errors (stack handling), many many other "low-frequency" sources remain. A brief (non-exhaustive!) list of some of them:
You can potentially trigger a garbage collection step by any allocation, which will then run the GC for a bit, which uses its shared structures. And while allocations usually don't trigger the GC, the GCdebt counter that controls this is part of the global state, so once it crosses the threshold, allocations on multiple threads at the same time have a good chance to start the GC on several threads at once. (If that happens, it'll almost certainly explode violently.) Any allocation means, among others
creating tables, coroutines, userdata, …
concatenating strings, reading from files, tostring(), …
calling functions(!) (if that requires growing the stack or allocating a new CallInfo slot)
etc.
(Re-)Setting a thing's metatable may modify GC structures. (If the metatable has __gc or __mode, it gets added to a list.)
Adding new fields to tables, which may trigger a resize. If you're also accessing it from another thread during the resize (even just reading existing fields), well… *boom*. (Or not boom, because while the data may have moved to a different area, the memory where it was before is probably still accessible. So it might "work" or only lead to silent corruption.)
Even if you stopped the GC, creating new strings is unsafe because it may modify the string cache.
And then probably lots of other things…
Making it Fail
For fun, you can re-build Lua and #define both HARDSTACKTESTS and HARDMEMTESTS (e.g. at the very top of luaconf.h). This will enable some code that will reallocate the stack and run a full GC cycle in many places. (For me, it does 260 stack reallocations and 235 collections just until it brings up the prompt. Just hitting return (running an empty program) does 13 stack reallocations and 6 collections.) Running your program that seems to work with that enabled will probably make it crash… or maybe not?
Why it might still "work"
So for instance a call in the Lua interpreter to
StartNewTask("PeriodicPrint", function(str)
for i=1,10 print(str); sleep(1); end
end, "Hello")
Will produce for the next 10 seconds a print of "Hello" every second.
In this particular example, there's not much happening. All the functions and strings are allocated before you start the thread. Without HARDSTACKTESTS, you might be lucky and the stack is already big enough. Even if the stack needs to grow, the allocation (& collection cycle because HARDMEMTESTS) may have the right timing so that it doesn't break horribly. But the more "real work" that your test program does, the more likely it will be that it will crash. (One good way to do that is to create lots of tables and stuff so the GC needs more time for the full cycle and the time window for interesting race conditions gets bigger. Or maybe just repeatedly run a dummy function really fast like for i = 1, 1e9 do (function() return i end)() end on 2+ threads and hope for the best… err, worst.)

Basic Smart Card testing in Windows

I am attempting to simply test for the presence of a Smart Card in Windows. The goal is to have a "daemon" running that will perform an action whenever (and for the duration) a card is inserted.
I have zero experience with things of this nature. I have read the documentation for SCardStatus and such, but I don't understand how the whole API works so I'm a bit lost.
What would be most helpful for me is if someone has a very simple example of a complete program that simply tests for the presence of a card (preferably in C++ but I'll take what I can get!). I would be most appreciative. I don't need any card status other than it exists. Thanks!
if you work on windows you need to use WinSCard API, if you use unix, then use PCSC. These two APIs are very similar, because of standards, but WinSCard API is much bigger and gives much more functions to use. These two APIs are implemented with C language, but you can wrap them in C++ pretty easy. I just whant to point out if you gonna wrap those two APIs in to C++ to use it ever on windows AND on unix take a look at smart card protocols numerical values, those are different on these platforms.
Basics:
You need to establish context (its like creating smart card manager)
SCardEstablishContext
It takes 4 parameters but, for basic using, you need only 2, scope and pointer to context handle.
LPSCARDCONTEXT hSCardContext = NULL;
int ret = SCardEstablishContext(SCARD_SCOPE_USER, NULL, NULL, &hSCardContext);
if (ret != ERROR_SUCCES) ... // handle error
Smart card are grouped in different groups. So there is functions to work with groups, to create it and so on.
To get readers list (for basic applications you actually dont really need groups)
SCardListReaders
It takes 4 parameters, context, pointer to group, pointer to readers and pointer to reader count
you can use it lise this
char *szGroups = NULL;
long readers = 0;
int res = SCardListReaders(hSCardContext, szGroups, NULL, &readers);
// handle errors
you get readers count first. now you can allocate memory for actual readers.
szReaders = (char *) malloc(sizeof(char) * readers);
int res = SCardListReaders(hSCardContext, szGroups, szReaders , &readers);
Now you have list of readers connected.
You can connect to a reader like so
LPSCARDHANDLE hSCard = NULL;
long activeProtocols = 0;
int ret = SCardConnect(hSCardContext, myReader, SCARD_SHARE_EXCLUSIVE, SCARD_PROTOCOL_TX, &hSCard, &activeProtocols);
// .. handle errors
specify protocols, share mode, use SCARD_SHARE_EXCLUSIVE for share mode if youre working with sensitive stuff witch needs protection that OS wont interact with transactions.
Once again if you are wrapping for windows and unix ( unix does not have SCARD_PROTOCOL_TX protocol) but it is a representation of these two SCARD_PROTOCOL_T0 | SCARD_PROTOCOL_T1.
myReader is the name of connected reader. Like (LPCTSTR)"Dermalog LF10"
you get those reader names from SCardListReaders function.
now you are connected with a card. with SCARD_SHARE_EXCLUSIVE sharing dont forget to release the smart card context, because it will deadlock.
use SCardDisconnect to disconnect, it takes 2 parameters, SmartCard handle, and disposition, for basic application SCARD_LEAVE_CARD disposition should be ok. It specifies that you dont want to do anything special to the card, you dont want to eject or whatever.
Transactions are more complex, because you need to know SCard standards and what not. But i covered the basics.
Keep in mind that this code might not compile, you need to improve types, for windows you need to cast those types in WinAPI types, like LPCTSTR, that it wont complain and unix dont have such types so you need to work around these problems too.
This sample code assumes card readers are plugged in at the start, it does not handle a changing number of card readers.
Other than that, it simply spams the console with inserted / not inserted status of cards.
Please do not use this as-is in production code, most error checking is omitted and some shortcuts are taken to keep the code short(ish).
#pragma comment(lib, "winscard.lib")
#include <vector>
bool test()
{
DWORD dwReaders;
LPSTR szReaders = NULL;
SCARDCONTEXT hContext;
bool bRunning = true;
std::vector<const char*> cards;
LONG status = SCardEstablishContext(SCARD_SCOPE_USER, NULL, NULL, &hContext);
if( status != SCARD_S_SUCCESS ) {
return false;
}
dwReaders = SCARD_AUTOALLOCATE;
if( SCardListReadersA(hContext, NULL, (LPSTR)&szReaders, &dwReaders) == SCARD_S_SUCCESS ) {
LPSTR reader = szReaders;
while (reader != NULL && *reader != '\0') {
std::cout << "Reader name: '" << reader << "'" << std::endl;
cards.push_back( reader );
reader += strlen(reader)+1;
}
LPSCARD_READERSTATEA lpState = new SCARD_READERSTATEA[cards.size()];
for( size_t n = 0; n < cards.size(); ++n ) {
memset( lpState + n, 0, sizeof(SCARD_READERSTATEA) );
lpState[n].szReader = cards[n];
}
do {
status = SCardGetStatusChangeA( hContext, 500, lpState, cards.size() );
switch( status )
{
case SCARD_S_SUCCESS:
case SCARD_E_TIMEOUT:
for( size_t n = 0; n < cards.size(); ++n ) {
if( lpState[n].dwEventState & SCARD_STATE_PRESENT) {
std::cout << "'" << lpState[n].szReader << "' present" << std::endl;
} else {
std::cout << "'" << lpState[n].szReader << "' not present" << std::endl;
}
}
break;
default:
std::cout << "Other result: " << status << std::endl;
break;
}
Sleep( 1000 ); // do not spam too bad
} while( bRunning );
// only do this after being done with the strings, or handle the names another way!
SCardFreeMemory( hContext, szReaders );
}
SCardReleaseContext( hContext );
return true;
}

Shared-memory IPC synchronization (lock-free)

Consider the following scenario:
Requirements:
Intel x64 Server (multiple CPU-sockets => NUMA)
Ubuntu 12, GCC 4.6
Two processes sharing large amounts of data over (named) shared-memory
Classical producer-consumer scenario
Memory is arranged in a circular buffer (with M elements)
Program sequence (pseudo code):
Process A (Producer):
int bufferPos = 0;
while( true )
{
if( isBufferEmpty( bufferPos ) )
{
writeData( bufferPos );
setBufferFull( bufferPos );
bufferPos = ( bufferPos + 1 ) % M;
}
}
Process B (Consumer):
int bufferPos = 0;
while( true )
{
if( isBufferFull( bufferPos ) )
{
readData( bufferPos );
setBufferEmpty( bufferPos );
bufferPos = ( bufferPos + 1 ) % M;
}
}
Now the age-old question: How to synchronize them effectively!?
Protect every read/write access with mutexes
Introduce a "grace period", to allow writes to complete: Read data in buffer N, when buffer(N+3) has been marked as full (dangerous, but seems to work...)
?!?
Ideally I would like something along the lines of a memory-barrier, that guarantees that all previous reads/writes are visible across all CPUs, along the lines of:
writeData( i );
MemoryBarrier();
//All data written and visible, set flag
setBufferFull( i );
This way, I would only have to monitor the buffer flags and then could read the large data chunks safely.
Generally I'm looking for something along the lines of acquire/release fences as described by Preshing here:
http://preshing.com/20130922/acquire-and-release-fences/
(if I understand it correctly the C++11 atomics only work for threads of a single process and not along multiple processes.)
However the GCC-own memory barriers (__sync_synchronize in combination with the compiler barrier asm volatile( "" ::: "memory" ) to be sure) don't seem to work as expected, as writes become visible after the barrier, when I expected them to be completed.
Any help would be appreciated...
BTW: Under windows this just works fine using volatile variables (a Microsoft specific behaviour)...
Boost Interprocess has support for Shared Memory.
Boost Lockfree has a Single-Producer Single-Consumer queue type (spsc_queue). This is basically what you refer to as a circular buffer.
Here's a demonstration that passes IPC messages (in this case, of type string) using this queue, in a lock-free fashion.
Defining the types
First, let's define our types:
namespace bip = boost::interprocess;
namespace shm
{
template <typename T>
using alloc = bip::allocator<T, bip::managed_shared_memory::segment_manager>;
using char_alloc = alloc<char>;
using shared_string = bip::basic_string<char, std::char_traits<char>, char_alloc >;
using string_alloc = alloc<shared_string>;
using ring_buffer = boost::lockfree::spsc_queue<
shared_string,
boost::lockfree::capacity<200>
// alternatively, pass
// boost::lockfree::allocator<string_alloc>
>;
}
For simplicity I chose to demo the runtime-size spsc_queue implementation, randomly requesting a capacity of 200 elements.
The shared_string typedef defines a string that will transparently allocate from the shared memory segment, so they are also "magically" shared with the other process.
The consumer side
This is the simplest, so:
int main()
{
// create segment and corresponding allocator
bip::managed_shared_memory segment(bip::open_or_create, "MySharedMemory", 65536);
shm::string_alloc char_alloc(segment.get_segment_manager());
shm::ring_buffer *queue = segment.find_or_construct<shm::ring_buffer>("queue")();
This opens the shared memory area, locates the shared queue if it exists. NOTE This should be synchronized in real life.
Now for the actual demonstration:
while (true)
{
std::this_thread::sleep_for(std::chrono::milliseconds(10));
shm::shared_string v(char_alloc);
if (queue->pop(v))
std::cout << "Processed: '" << v << "'\n";
}
The consumer just infinitely monitors the queue for pending jobs and processes one each ~10ms.
The Producer side
The producer side is very similar:
int main()
{
bip::managed_shared_memory segment(bip::open_or_create, "MySharedMemory", 65536);
shm::char_alloc char_alloc(segment.get_segment_manager());
shm::ring_buffer *queue = segment.find_or_construct<shm::ring_buffer>("queue")();
Again, add proper synchronization to the initialization phase. Also, you would probably make the producer in charge of freeing the shared memory segment in due time. In this demonstration, I just "let it hang". This is nice for testing, see below.
So, what does the producer do?
for (const char* s : { "hello world", "the answer is 42", "where is your towel" })
{
std::this_thread::sleep_for(std::chrono::milliseconds(250));
queue->push({s, char_alloc});
}
}
Right, the producer produces precisely 3 messages in ~750ms and then exits.
Note that consequently if we do (assume a POSIX shell with job control):
./producer& ./producer& ./producer&
wait
./consumer&
Will print 3x3 messages "immediately", while leaving the consumer running. Doing
./producer& ./producer& ./producer&
again after this, will show the messages "trickle in" in realtime (in burst of 3 at ~250ms intervals) because the consumer is still running in the background
See the full code online in this gist: https://gist.github.com/sehe/9376856

what is the fastest way to notify another thread that data is available? any alternativies to spinning?

One my thread writes data to circular-buffer and another thread need to process this data ASAP. I was thinking to write such simple spin. Pseudo-code!
while (true) {
while (!a[i]) {
/* do nothing - just keep checking over and over */
}
// process b[i]
i++;
if (i >= MAX_LENGTH) {
i = 0;
}
}
Above I'm using a to indicate that data stored in b is available for processing. Probaly I should also set thread afinity for such "hot" process. Of course such spin is very expensive in terms of CPU but it's OK for me as my primary requirement is latency.
The question is - am I should really write something like that or boost or stl allows something that:
Easier to use.
Has roughly the same (or even better?) latency at the same time occupying less CPU resources?
I think that my pattern is so general that there should be some good implementation somewhere.
upd It seems my question is still too complicated. Let's just consider the case when i need to write some items to array in arbitrary order and another thread should read them in right order as items are available, how to do that?
upd2
I'm adding test program to demonstrate what and how I want to achive. At least on my machine it happens to work. I'm using rand to show you that I can not use general queue and I need to use array-based structure:
#include "stdafx.h"
#include <string>
#include <boost/thread.hpp>
#include "windows.h" // for Sleep
const int BUFFER_LENGTH = 10;
int buffer[BUFFER_LENGTH];
short flags[BUFFER_LENGTH];
void ProcessorThread() {
for (int i = 0; i < BUFFER_LENGTH; i++) {
while (flags[i] == 0);
printf("item %i received, value = %i\n", i, buffer[i]);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
memset(flags, 0, sizeof(flags));
boost::thread processor = boost::thread(&ProcessorThread);
for (int i = 0; i < BUFFER_LENGTH * 10; i++) {
int x = rand() % BUFFER_LENGTH;
buffer[x] = x;
flags[x] = 1;
Sleep(100);
}
processor.join();
return 0;
}
Output:
item 0 received, value = 0
item 1 received, value = 1
item 2 received, value = 2
item 3 received, value = 3
item 4 received, value = 4
item 5 received, value = 5
item 6 received, value = 6
item 7 received, value = 7
item 8 received, value = 8
item 9 received, value = 9
Is my program guaranteed to work? How would you redesign it, probably using some of existent structures from boost/stl instead of array? Is it possible to get rid of "spin" without affecting latency?
If the consuming thread is put to sleep it takes a few microseconds for it to wake up. This is the process scheduler latency you cannot avoid unless the thread is busy-spinning as you do. The thread also needs to be real-time FIFO so that it is never put to sleep when it is ready to run but exhausted its time quantum.
So, there is no alternative that could match latency of busy spinning.
(Surprising you are using Windows, it is best avoided if you are serious about HFT).
This is what Condition Variables were designed for. std::condition_variable is defined in the C++11 standard library.
What exactly is fastest for your purposes depends on your problem; You can attack it from several angles, but CVs (or derivative implementations) are a good starting point for understanding the subject better and approaching an implementation.
Consider using C++11 library if your compiler supports it. Or boost analog if not. And in your case especially std::future with std::promise.
There is a good book about threading and C++11 threading library:
Anthony Williams. C++ Concurrency in Action (2012)
Example from cppreference.com:
#include <iostream>
#include <future>
#include <thread>
int main()
{
// future from a packaged_task
std::packaged_task<int()> task([](){ return 7; }); // wrap the function
std::future<int> f1 = task.get_future(); // get a future
std::thread(std::move(task)).detach(); // launch on a thread
// future from an async()
std::future<int> f2 = std::async(std::launch::async, [](){ return 8; });
// future from a promise
std::promise<int> p;
std::future<int> f3 = p.get_future();
std::thread( [](std::promise<int>& p){ p.set_value(9); },
std::ref(p) ).detach();
std::cout << "Waiting..." << std::flush;
f1.wait();
f2.wait();
f3.wait();
std::cout << "Done!\nResults are: "
<< f1.get() << ' ' << f2.get() << ' ' << f3.get() << '\n';
}
If you want a fast method then simply drop to making OS calls. Any C++ library wrapping them is going to be slower.
e.g. On Windows your consumer can call WaitForSingleObject(), and your data-producing thread can wake the consumer using SetEvent(). http://msdn.microsoft.com/en-us/library/windows/desktop/ms687032(v=vs.85).aspx
For Unix, here is a similar question with answers: Windows Event implementation in Linux using conditional variables?
Do you really need threading?
A single threaded app is trivially simple and eliminates all the issues with thread safety and the overhead of launching threads. I did a study of threaded vs non threaded code to append text to a log file. The non threaded code was better in every measure of performance.

How to implement two periodical processes in C++ under Linux?

I am doing real time programming in C++, under Linux.
I have two process, let me say A and B. A process is being started periodically, every 5ms. B process is being started every 10ms. The process A is doing some change of data. The process B is reading that data and displays it.
I am confused about how to run periodically processes, and should I have two .cpp programs for each process?
I think that, if possible, create a single process with two threads might be a good solution also, since it might be much easier for them to share resources and synchronize their data.
But, if you need more than this, then I think you need to be clearer when stating your problem.
As a different solution, to create two separate processes that communicate with each other, all you really have to worry about is the IPC, not really how these processes are created; i.e. just create the two processes, A and B, as you would normally do (system() or fork() or popen() etc).
Now, the easiest way to make them talk to each other is using Named Pipes. They are one way, so you'll have to create one for A -> B and another for B -> A. They don't need any locking or synchronization since that is kinda done by the kernel/libc themselves. One you set up the pipes, you could use them as though they were simple network connections/sockets.
If you need 'MORE POWER(TM) (C)2010', then you'll have to use Shared Memory and Sempahores, or Message queues. They are, however, much more complicated, so I suggest you look into named pipes first.
Now, for the periodical running, the best way is to use usleep(T) in each program's main function; where the time T you use can be calculated from the last time you ran, instead of putting a fixed time in there, so that you guarantee that is a run took longer than expected, you'll sleep less time, to guarantee that every X milliseconds your program runs.
Another way of doing it, is using SIGALRM like this:
#include <iostream>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <semaphore.h>
static sem_t __semAlaram;
static void* waitForAlaram(void*)
{
while( true )
{
sem_wait( &__semAlaram );
std::cout << "Got alaram" << std::endl;
}
return NULL;
}
typedef void (*sighandler_t)(int);
static sighandler_t __handler = NULL;
static int count = 0;
static void sighandler(int signal)
{
if ( signal == SIGALRM )
{
count++;
sem_post( &__semAlaram );
alarm(3);
}
else if ( __handler )
__handler( signal );
}
int main(int argc, char **argv)
{
if ( sem_init( &__semAlaram, 0, 0 ) != 0 )
{
std::cerr << strerror( errno ) << std::endl;
return -1;
}
pthread_t thread;
if ( pthread_create( &thread, NULL, waitForAlaram, NULL ) != 0 )
{
std::cerr << strerror( errno ) << std::endl;
return -1;
}
__handler = signal( SIGALRM, sighandler );
alarm(3);
while( count < 5 )
{
sleep(1);
}
return 0;
}
You don't really need the thread in there, but it might be a good idea if you have more than 1 thing your program does, so that one task will not affect the timing of the critical one. Anyway, since I already had that example set up that way, it was easier to just copy-paste it the way it was. ;-)
Edit:
Now that I read my post, I noticed a fatal flaw: the SIGALRM can only handle 1s precision, and you need ms precision. In that case, if you choose this solution, you'll have to use timer_create(); which is very similar to alarm(), but can handle ms precision. In linux, a man 2 timer_create will give you an example on how to use it.