Publisher/Subscriber with changing subscriptions during loop - c++

This is more of a general design query I had. I have implemented a publish / subscribe pattern by maintaining a list of subscribers. When an event to publish occurs, I loop through the subscribers and push the event to each one, of them in turn.
My problem occurs when due to that publication, somewhere in the depth of the software, another component or event the described component decide to unsubscribe themselves. By doing so, they invalidate my iterator and cause crashes.
What is the best way to solve this? I have been thinking of wrapping the whole publication loop into a try catch block, but that means some subscribers miss the particular subscription upon which someone unsubscribed, and seems a bit over the top. Then I tried feeding it back, e.g. I turned the void publish call into a bool publish call that returns true when the subscriber wants to be deleted, which works for that case, but not if another subscriber unsubscribes. Then I am thinking to "cache" unsubscription requests somewhere and release them when the loop is done, but that seems a bit overkill. Then I am thinking of storing the iterator as a class member, so that I can manipulate the iterator from outside, but that gets messy (say you unsubscribe subscriber 1, iterator is pointed at 2, and the container is a vector - then the iterator would have to be decremented). I think I might prefer one of the latter two solutions, but both seem not ideal.
Is this a common problem? Is there a more elegant solution?

You could either disallow subscription operations during publication, or you could use an appropriate data structure to hold your subscription list, or both.
Assuming that you keep your subscribers in a std::list, you could run your loop thus:
for(iterator_type it = subs.begin(); it != subs.end(); ) {
iterator_type next = it;
++next;
it->notifier();
it = next;
}
That way, if the current item is removed, you still have a valid iterator in next. Of course, you still can't allow arbitrary removal (what if next is removed?) during publication.
To allow arbitrary removal, mark an item as invalid and defer its list removal until it is safe to do so:
... publication loop ...
dontRemoveItems = true;
for(iterator_type it = subs.begin(); it != subs.end(); ++it) {
if(it->valid)
it->notifier();
}
std::erase(std::remove_if(...,, IsNotValid),...);
dontRemoveItems = false;
elsewhere,
... removal code:
if(dontRemoveItems) item->valid = false;
else subs.erase(item);

Related

Polymer not unlistening properly

I use Polymer.Templatizer to stamp templates of paper-input collections into a custom-element which has a listener 'change':'_doStuff'.
Basically when I stamp 20 paper-inputs via Polymer.dom(this).appendChild(template.root) a bunch of listeners are added , as you can see in the graph.
Then I call another function that goes through all of those elements and does Polymer.dom(paperInput.parentNode).removeChild(paperInput) and adds another set of inputs. But it just doesn't detach listeners on those for some reason and the heap is growing with every iteration...
The listener change on the host element, I believe, is neither detached.
What am I doing wrongly?
EDIT: I know what it is, it's not garbage collection problem, but Polymer creates anonymous Polymer.Base instances when templatizing and actually puts all template's children into those. of course the instances are not removed in any way. I wish I knew how to kill those not to reduce performance of the app. By defining custom elements instead? Looks like an overhead to me...
A way to free your memory could be to unlisten all of your listeners before you detach your elements, in order them to be unreferenced first. For example :
for (var i = 0; i < inputList.length; i++) {
this.unlisten(inputList[i], 'change', 'doStuff');
Polymer.dom(inputList[i].parentNode).removeChild(inputList[i]);
}

Querying a growing data-set

We have a data set that grows while the application is processing the data set. After a long discussion we have come to the decision that we do not want blocking or asynchronous APIs at this time, and we will periodically query our data store.
We thought of two options to design an API for querying our storage:
A query method returns a snapshot of the data and a flag indicating weather we might have more data. When we finish iterating over the last returned snapshot, we query again to get another snapshot for the rest of the data.
A query method returns a "live" iterator over the data, and when this iterator advances it returns one of the following options: Data is available, No more data, Might have more data.
We are using C++ and we borrowed the .NET style enumerator API for reasons which are out of scope for this question. Here is some code to demonstrate the two options. Which option would you prefer?
/* ======== FIRST OPTION ============== */
// similar to the familier .NET enumerator.
class IFooEnumerator
{
// true --> A data element may be accessed using the Current() method
// false --> End of sequence. Calling Current() is an invalid operation.
virtual bool MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
enum class Availability
{
EndOfData,
MightHaveMoreData,
};
class IDataProvider
{
// Query params allow specifying the ID of the starting element. Here is the intended usage pattern:
// 1. Call GetFoo() without specifying a starting point.
// 2. Process all elements returned by IFooEnumerator until it ends.
// 3. Check the availability.
// 3.1 MightHaveMoreDataLater --> Invoke GetFoo() again after some time by specifying the last processed element as the starting point
// and repeat steps (2) and (3)
// 3.2 EndOfData --> The data set will not grow any more and we know that we have finished processing.
virtual std::tuple<std::unique_ptr<IFooEnumerator>, Availability> GetFoo(query-params) = 0;
};
/* ====== SECOND OPTION ====== */
enum class Availability
{
HasData,
MightHaveMoreData,
EndOfData,
};
class IGrowingFooEnumerator
{
// HasData:
// We might access the current data element by invoking Current()
// EndOfData:
// The data set has finished growing and no more data elements will arrive later
// MightHaveMoreData:
// The data set will grow and we need to continue calling MoveNext() periodically (preferably after a short delay)
// until we get a "HasData" or "EndOfData" result.
virtual Availability MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
class IDataProvider
{
std::unique_ptr<IGrowingFooEnumerator> GetFoo(query-params) = 0;
};
Update
Given the current answers, I have some clarification. The debate is mainly over the interface - its expressiveness and intuitiveness in representing queries for a growing data-set that at some point in time will stop growing. The implementation of both interfaces is possible without race conditions (at-least we believe so) because of the following properties:
The 1st option can be implemented correctly if the pair of the iterator + the flag represent a snapshot of the system at the time of querying. Getting snapshot semantics is a non-issue, as we use database transactions.
The 2nd option can be implemented given a correct implementation of the 1st option. The "MoveNext()" of the 2nd option will, internally, use something like the 1st option and re-issue the query if needed.
The data-set can change from "Might have more data" to "End of data", but not vice versa. So if we, wrongly, return "Might have more data" because of a race condition, we just get a small performance overhead because we need to query again, and the next time we will receive "End of data".
"Invoke GetFoo() again after some time by specifying the last processed element as the starting point"
How are you planning to do that? If it's using the earlier-returned IFooEnumerator, then functionally the two options are equivalent. Otherwise, letting the caller destroy the "enumerator" then however-long afterwards call GetFoo() to continue iteration means you're losing your ability to monitor the client's ongoing interest in the query results. It might be that right now you have no need for that, but I think it's poor design to exclude the ability to track state throughout the overall result processing.
It really depends on many things whether the overall system will at all work (not going into details about your actual implementation):
No matter how you twist it, there will be a race condition between checking for "Is there more data" and more data being added to the system. Which means that it's possibly pointless to try to capture the last few data items?
You probably need to limit the number of repeated runs for "is there more data", or you could end up in an endless loop of "new data came in while processing the last lot".
How easy it is to know if data has been updated - if all the updates are "new items" with new ID's that are sequentially higher, you can simply query "Is there data above X", where X is your last ID. But if you are, for example, counting how many items in the data has property Y set to value A, and data may be updated anywhere in the database at the time (e.g. a database of where taxis are at present, that gets updated via GPS every few seconds and has thousands of cars, it may be hard to determine which cars have had updates since last time you read the database).
As to your implementation, in option 2, I'm not sure what you mean by the MightHaveMoreData state - either it has, or it hasn't, right? Repeated polling for more data is a bad design in this case - given that you will never be able to say 100% certain that there hasn't been "new data" provided in the time it took from fetching the last data until it was processed and acted on (displayed, used to buy shares on the stock market, stopped the train or whatever it is that you want to do once you have processed your new data).
Read-write lock could help. Many readers have simultaneous access to data set, and only one writer.
The idea is simple:
-when you need read-only access, reader uses "read-block", which could be shared with other reads and exclusive with writers;
-when you need write access, writer uses write-lock which is exclusive for both readers and writers;

data structure for a circuit switching?

I would like to create something like this :
I have a module that does something like 'circuit switching' for a stream of messages. That is, it has a single inport and multiple outports. Once a message arrives at the inport, an outport is selected based on some logic (logic is not important in the context of the question). It is checked whether, there is any ongoing message transfer on the outport (for the first message, there won't be any). If there is no transfer, message is sent to that outport, otherwise, it is kept in queue for that particular outport. I need to decide data structure for this communication. Please advice
My idea is to have a map of outports and corresponding queues.
queue<message> m_incoming_queue;
typedef map<outport*,m_incoming_queue> transaction_map
if this is a good solution, i want to know how do I create a queue at the runtime? as in, I dont know in advance how many outports will there be, I create outports based on requirement.
Maybe something like:
// At beginning
typedef queue<message> MessageQueue
typedef map<outport*, MessageQueue> transaction_map
transaction_map tm() // Create the transaction map
// On receipt of each message
// (Some logic that determines outport* op and message m)
if(tm.count(*op) == 0)
{
// There are no queues yet, create one and insert it
tm.insert(transaction_map::value_type(*op, MessageQueue()))
}
// There is already a queue created, so add to it
tm[*op].push(m)

Asynchronous network calls

I made a class that has an asynchronous OpenWebPage() function. Once you call OpenWebPage(someUrl), a handler gets called - OnPageLoad(reply). I have been using a global variable called lastAction to take care of stuff once a page is loaded - handler checks what is the lastAction and calls an appropriate function. For example:
this->lastAction == "homepage";
this->OpenWebPage("http://www.hardwarebase.net");
void OnPageLoad(reply)
{
if(this->lastAction == "homepage")
{
this->lastAction = "login";
this->Login(); // POSTs a form and OnPageLoad gets called again
}
else if(this->lastAction == "login")
{
this->PostLogin(); // Checks did we log in properly, sets lastAction as new topic and goes to new topic URL
}
else if(this->lastAction == "new topic")
{
this->WriteTopic(); // Does some more stuff ... you get the point
}
}
Now, this is rather hard to write and keep track of when we have a large number of "actions". When I was doing stuff in Python (synchronously) it was much easier, like:
OpenWebPage("http://hardwarebase.net") // Stores the loaded page HTML in self.page
OpenWebpage("http://hardwarebase.net/login", {"user": username, "pw": password}) // POSTs a form
if(self.page == ...): // now do some more checks etc.
// do something more
Imagine now that I have a queue class which holds the actions: homepage, login, new topic. How am I supposed to execute all those actions (in proper order, one after one!) via the asynchronous callback? The first example is totally hard-coded obviously.
I hope you understand my question, because frankly I fear this is the worst question ever written :x
P.S. All this is done in Qt.
You are inviting all manner of bugs if you try and use a single member variable to maintain state for an arbitrary number of asynchronous operations, which is what you describe above. There is no way for you to determine the order that the OpenWebPage calls complete, so there's also no way to associate the value of lastAction at any given time with any specific operation.
There are a number of ways to solve this, e.g.:
Encapsulate web page loading in an immutable class that processes one page per instance
Return an object from OpenWebPage which tracks progress and stores the operation's state
Fire a signal when an operation completes and attach the operation's context to the signal
You need to add "return" statement in the end of every "if" branch: in your code, all "if" branches are executed in the first OnPageLoad call.
Generally, asynchronous state mamangment is always more complicated that synchronous. Consider replacing lastAction type with enumeration. Also, if OnPageLoad thread context is arbitrary, you need to synchronize access to global variables.

While using ConcurrentQueue, trying to dequeue while looping through in parallel

I am using the parallel data structures in my .NET 4 application and I have a ConcurrentQueue that gets added to while I am processing through it.
I want to do something like:
personqueue.AsParallel().WithDegreeOfParallelism(20).ForAll(i => ... );
as I make database calls to save the data, so I am limiting the number of concurrent threads.
But, I expect that the ForAll isn't going to dequeue, and I am concerned about just doing
ForAll(i => {
personqueue.personqueue.TryDequeue(...);
...
});
as there is no guarantee that I am popping off the correct one.
So, how can I iterate through the collection and dequeue, in a parallel fashion.
Or, would it be better to use PLINQ to do this processing, in parallel?
Well I'm not 100% sure what you try to archive here. Are you trying to just dequeue all items until nothing is left? Or just dequeue lots of items in one go?
The first probably unexpected behavior starts with this statement:
theQueue.AsParallel()
For a ConcurrentQueue, you get a 'Snapshot'-Enumerator. So when you iterate over a concurrent stack, you only iterate over the snapshot, no the 'live' queue.
In general I think it's not a good idea to iterate over something you're changing during the iteration.
So another solution would look like this:
// this way it's more clear, that we only deque for theQueue.Count items
// However after this, the queue is probably not empty
// or maybe the queue is also empty earlier
Parallel.For(0, theQueue.Count,
new ParallelOptions() {MaxDegreeOfParallelism = 20},
() => {
theQueue.TryDequeue(); //and stuff
});
This avoids manipulation something while iterating over it. However, after that statement, the queue can still contain data, which was added during the for-loop.
To get the queue empty for moment in time you probably need a little more work. Here's an really ugly solution. While the queue has still items, create new tasks. Each task start do dequeue from the queue as long as it can. At the end, we wait for all tasks to end. To limit the parallelism, we never create more than 20-tasks.
// Probably a kitty died because of this ugly code ;)
// However, this code tries to get the queue empty in a very aggressive way
Action consumeFromQueue = () =>
{
while (tt.TryDequeue())
{
; // do your stuff
}
};
var allRunningTasks = new Task[MaxParallism];
for(int i=0;i<MaxParallism && tt.Count>0;i++)
{
allRunningTasks[i] = Task.Factory.StartNew(consumeFromQueue);
}
Task.WaitAll(allRunningTasks);
If you are aiming at a high throughout real site and you don't have to do immediate DB updates , you'll be much better of going for very conservative solution rather than extra layers libraries.
Make fixed size array (guestimate size - say 1000 items or N seconds worth of requests) and interlocked index so that requests just put data into slots and return. When one block gets filled (keep checking the count), make another one and spawn async delegate to process and send to SQL the block that just got filled. Depending on the structure of your data that delegate can pack all data into comma-separated arrays, maybe even a simple XML (got to test perf of that one of course) and send them to SQL sproc which should give it's best to process them record by record - never holding a big lock. It if gets heavy, you can split your block into several smaller blocks. The key thing is that you minimized the number of requests to SQL, always kept one degree of separation and didn't even have to pay the price for a thread pool - you probably won't need to use more that 2 async threads at all.
That's going to be a lot faster that fiddling with Parallel-s.