Given a System that contains two components, A and B, and
The System starts up A and B concurrently. Now A can go through states {A.Starting, A.Ready}, and B can be in states {B.Starting, B.DoingX, B.DoingY}. (Events to transition between A's and B's states are named accordingly: B.doingx => B goes to B.DoingX etc...)
I want to model that
While A is in A.Starting, or B is in B.Starting, the System is "Starting"
The System is in state "DoingX" when A is in A.Ready and B is in B.DoingX
The System is in state "DoingY" when A is in A.Ready and B is in B.DoingY
If I'm not mistaken, the fork/join pseudo-states could be used here.
But do these model elements have the declarative semantics of the composed state mentioned above? Is there another way to model this?
(Note: the diagrams are from http://yuml.me)
Why don't you just pull these apart? Here's another idea on how you could model it (assuming I understood it correctly) :
a state "Starting", that contains the states you refer to as A.Starting and B.Starting in parallel regions (you can use fork/joins here, or just rely on the default behavior of all regions being activated when "Starting" state is entered)
another state "Doing" that contains a region with your "A.Ready" state and another parallel region, that contains the two states "B.DoingX" and "B.DoingY".
If you really need to have an overall "DoingX" state, then you may have to create two states that correspond to A.Ready.
Anyways, on a broader perspective: I believe your point of view is a little bit off here, when you say that the "System is in state ...". Rather, the system modeled by such a state machine is in a set of states. So normally, I would be perfectly happy to say that "the system is currently in A.Ready and B.DoingX".
Maybe all you need is a change of terminology. What about this:
The system is in configuration "DoingX" when A.Ready and B.DoingX states are active ?
In response to the comment: Yes, this is standard, here's the corresponding part from the superstructure specification (version 2.4 beta):
In a hierarchical state machine more than one state can be active at the same time. [...] the current active “state” is actually represented by a set of trees of states
starting with the top-most states of the root regions down to the innermost active substate. We refer to such a state tree as
a state configuration.
Related
I'm having trouble getting my head around the purpose of supply {…} blocks/the on-demand supplies that they create.
Live supplies (that is, the types that come from a Supplier and get new values whenever that Supplier emits a value) make sense to me – they're a version of asynchronous streams that I can use to broadcast a message from one or more senders to one or more receivers. It's easy to see use cases for responding to a live stream of messages: I might want to take an action every time I get a UI event from a GUI interface, or every time a chat application broadcasts that it has received a new message.
But on-demand supplies don't make a similar amount of sense. The docs say that
An on-demand broadcast is like Netflix: everyone who starts streaming a movie (taps a supply), always starts it from the beginning (gets all the values), regardless of how many people are watching it right now.
Ok, fair enough. But why/when would I want those semantics?
The examples also leave me scratching my head a bit. The Concurancy page currently provides three examples of a supply block, but two of them just emit the values from a for loop. The third is a bit more detailed:
my $bread-supplier = Supplier.new;
my $vegetable-supplier = Supplier.new;
my $supply = supply {
whenever $bread-supplier.Supply {
emit("We've got bread: " ~ $_);
};
whenever $vegetable-supplier.Supply {
emit("We've got a vegetable: " ~ $_);
};
}
$supply.tap( -> $v { say "$v" });
$vegetable-supplier.emit("Radish"); # OUTPUT: «We've got a vegetable: Radish»
$bread-supplier.emit("Thick sliced"); # OUTPUT: «We've got bread: Thick sliced»
$vegetable-supplier.emit("Lettuce"); # OUTPUT: «We've got a vegetable: Lettuce»
There, the supply block is doing something. Specifically, it's reacting to the input of two different (live) Suppliers and then merging them into a single Supply. That does seem fairly useful.
… except that if I want to transform the output of two Suppliers and merge their output into a single combined stream, I can just use
my $supply = Supply.merge:
$bread-supplier.Supply.map( { "We've got bread: $_" }),
$vegetable-supplier.Supply.map({ "We've got a vegetable: $_" });
And, indeed, if I replace the supply block in that example with the map/merge above, I get exactly the same output. Further, neither the supply block version nor the map/merge version produce any output if the tap is moved below the calls to .emit, which shows that the "on-demand" aspect of supply blocks doesn't really come into play here.
At a more general level, I don't believe the Raku (or Cro) docs provide any examples of a supply block that isn't either in some way transforming the output of a live Supply or emitting values based on a for loop or Supply.interval. None of those seem like especially compelling use cases, other than as a different way to transform Supplys.
Given all of the above, I'm tempted to mostly write off the supply block as a construct that isn't all that useful, other than as a possible alternate syntax for certain Supply combinators. However, I have it on fairly good authority that
while Supplier is often reached for, many times one would be better off writing a supply block that emits the values.
Given that, I'm willing to hazard a pretty confident guess that I'm missing something about supply blocks. I'd appreciate any insight into what that might be.
Given you mentioned Supply.merge, let's start with that. Imagine it wasn't in the Raku standard library, and we had to implement it. What would we have to take care of in order to reach a correct implementation? At least:
Produce a Supply result that, when tapped, will...
Tap (that is, subscribe to) all of the input supplies.
When one of the input supplies emits a value, emit it to our tapper...
...but make sure we follow the serial supply rule, which is that we only emit one message at a time; it's possible that two of our input supplies will emit values at the same time from different threads, so this isn't an automatic property.
When all of our supplies have sent their done event, send the done event also.
If any of the input supplies we tapped sends a quit event, relay it, and also close the taps of all of the other input supplies.
Make very sure we don't have any odd races that will lead to breaking the supply grammar emit* [done|quit].
When a tap on the resulting Supply we produce is closed, be sure to close the tap on all (still active) input supplies we tapped.
Good luck!
So how does the standard library do it? Like this:
method merge(*#s) {
#s.unshift(self) if self.DEFINITE; # add if instance method
# [I elided optimizations for when there are 0 or 1 things to merge]
supply {
for #s {
whenever $_ -> \value { emit(value) }
}
}
}
The point of supply blocks is to greatly ease correctly implementing reusable operations over one or more Supplys. The key risks it aims to remove are:
Not correctly handling concurrently arriving messages in the case that we have tapped more than one Supply, potentially leading us to corrupt state (since many supply combinators we might wish to write will have state too; merge is so simple as not to). A supply block promises us that we'll only be processing one message at a time, removing that danger.
Losing track of subscriptions, and thus leaking resources, which will become a problem in any longer-running program.
The second is easy to overlook, especially when working in a garbage-collected language like Raku. Indeed, if I start iterating some Seq and then stop doing so before reaching the end of it, the iterator becomes unreachable and the GC eats it in a while. If I'm iterating over lines of a file and there's an implicit file handle there, I risk the file not being closed in a very timely way and might run out of handles if I'm unlucky, but at least there's some path to it getting closed and the resources released.
Not so with reactive programming: the references point from producer to consumer, so if a consumer "stops caring" but hasn't closed the tap, then the producer will retain its reference to the consumer (thus causing a memory leak) and keep sending it messages (thus doing throwaway work). This can eventually bring down an application. The Cro chat example that was linked is an example:
my $chat = Supplier.new;
get -> 'chat' {
web-socket -> $incoming {
supply {
whenever $incoming -> $message {
$chat.emit(await $message.body-text);
}
whenever $chat -> $text {
emit $text;
}
}
}
}
What happens when a WebSocket client disconnects? The tap on the Supply we returned using the supply block is closed, causing an implicit close of the taps of the incoming WebSocket messages and also of $chat. Without this, the subscriber list of the $chat Supplier would grow without bound, and in turn keep alive an object graph of some size for each previous connection too.
Thus, even in this case where a live Supply is very directly involved, we'll often have subscriptions to it that come and go over time. On-demand supplies are primarily about resource acquisition and release; sometimes, that resource will be a subscription to a live Supply.
A fair question is if we could have written this example without a supply block. And yes, we can; this probably works:
my $chat = Supplier.new;
get -> 'chat' {
web-socket -> $incoming {
my $emit-and-discard = $incoming.map(-> $message {
$chat.emit(await $message.body-text);
Supply.from-list()
}).flat;
Supply.merge($chat, $emit-and-discard)
}
}
Noting it's some effort in Supply-space to map into nothing. I personally find that less readable - and this didn't even avoid a supply block, it's just hidden inside the implementation of merge. Trickier still are cases where the number of supplies that are tapped changes over time, such as in recursive file watching where new directories to watch may appear. I don't really know how'd I'd express that in terms of combinators that appear in the standard library.
I spent some time teaching reactive programming (not with Raku, but with .Net). Things were easy with one asynchronous stream, but got more difficult when we started getting to cases with multiple of them. Some things fit naturally into combinators like "merge" or "zip" or "combine latest". Others can be bashed into those kinds of shapes with enough creativity - but it often felt contorted to me rather than expressive. And what happens when the problem can't be expressed in the combinators? In Raku terms, one creates output Suppliers, taps input supplies, writes logic that emits things from the inputs into the outputs, and so forth. Subscription management, error propagation, completion propagation, and concurrency control have to be taken care of each time - and it's oh so easy to mess it up.
Of course, the existence of supply blocks doesn't stop being taking the fragile path in Raku too. This is what I meant when I said:
while Supplier is often reached for, many times one would be better off writing a supply block that emits the values
I wasn't thinking here about the publish/subscribe case, where we really do want to broadcast values and are at the entrypoint to a reactive chain. I was thinking about the cases where we tap one or more Supply, take the values, do something, and then emit things into another Supplier. Here is an example where I migrated such code towards a supply block; here is another example that came a little later on in the same codebase. Hopefully these examples clear up what I had in mind.
I'm learning every day more about dds, so my question my sound weird. I hope it makes sense.
One of the requirements of some dds wrapper I'm writing, is that it times out after some timeout period if it fails to write. My question: How can I do that?
On Prism Tech's website's tutorial, there's explanation on how to use a WaitSet to block a read operation, but what about write?
Here's some code including the question:
dds::domain::DomainParticipant dp(0);
dds::topic::Topic<MyType> topic(dp, "MyTopic");
dds::pub::Publisher pub(dp);
dds::pub::DataWriter<MyType> dw(pub, topic);
MyType t;
dw.write(t); //how can I make this block for 5 seconds (tops), and then throw an error on failure?
I noticed there exists a function in the API DataWriter::wait_for_acknowledgements(int timeout), but this seems to be bound to the DataWriter object, not to the specific call of writing. Can I bind it with the call above?
This is configured in QoS, cf RELIABILITY, field "max_blocking_time". How you set this value will depend on the vendor's implementation. Generally you get the current QoS, update the field, write the QoS back. Keep in mind that certain QoS policies must be set before something else happens. Reliability is "Before Enable" (at least in the implementation I'm most familiar with), which means you need to create the data-writer disabled, update the QoS, then enable the writer.
If QoS can be set outside the application (via XML for example), then you can set the policy easily. Otherwise, you need to do it in code.
From the spec:
The value of the max_blocking_time indicates the maximum time the operation DataWriter::write is allowed to block if the DataWriter does not have space to store the value written. The default max_blocking_time=100ms.
We have a data set that grows while the application is processing the data set. After a long discussion we have come to the decision that we do not want blocking or asynchronous APIs at this time, and we will periodically query our data store.
We thought of two options to design an API for querying our storage:
A query method returns a snapshot of the data and a flag indicating weather we might have more data. When we finish iterating over the last returned snapshot, we query again to get another snapshot for the rest of the data.
A query method returns a "live" iterator over the data, and when this iterator advances it returns one of the following options: Data is available, No more data, Might have more data.
We are using C++ and we borrowed the .NET style enumerator API for reasons which are out of scope for this question. Here is some code to demonstrate the two options. Which option would you prefer?
/* ======== FIRST OPTION ============== */
// similar to the familier .NET enumerator.
class IFooEnumerator
{
// true --> A data element may be accessed using the Current() method
// false --> End of sequence. Calling Current() is an invalid operation.
virtual bool MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
enum class Availability
{
EndOfData,
MightHaveMoreData,
};
class IDataProvider
{
// Query params allow specifying the ID of the starting element. Here is the intended usage pattern:
// 1. Call GetFoo() without specifying a starting point.
// 2. Process all elements returned by IFooEnumerator until it ends.
// 3. Check the availability.
// 3.1 MightHaveMoreDataLater --> Invoke GetFoo() again after some time by specifying the last processed element as the starting point
// and repeat steps (2) and (3)
// 3.2 EndOfData --> The data set will not grow any more and we know that we have finished processing.
virtual std::tuple<std::unique_ptr<IFooEnumerator>, Availability> GetFoo(query-params) = 0;
};
/* ====== SECOND OPTION ====== */
enum class Availability
{
HasData,
MightHaveMoreData,
EndOfData,
};
class IGrowingFooEnumerator
{
// HasData:
// We might access the current data element by invoking Current()
// EndOfData:
// The data set has finished growing and no more data elements will arrive later
// MightHaveMoreData:
// The data set will grow and we need to continue calling MoveNext() periodically (preferably after a short delay)
// until we get a "HasData" or "EndOfData" result.
virtual Availability MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
class IDataProvider
{
std::unique_ptr<IGrowingFooEnumerator> GetFoo(query-params) = 0;
};
Update
Given the current answers, I have some clarification. The debate is mainly over the interface - its expressiveness and intuitiveness in representing queries for a growing data-set that at some point in time will stop growing. The implementation of both interfaces is possible without race conditions (at-least we believe so) because of the following properties:
The 1st option can be implemented correctly if the pair of the iterator + the flag represent a snapshot of the system at the time of querying. Getting snapshot semantics is a non-issue, as we use database transactions.
The 2nd option can be implemented given a correct implementation of the 1st option. The "MoveNext()" of the 2nd option will, internally, use something like the 1st option and re-issue the query if needed.
The data-set can change from "Might have more data" to "End of data", but not vice versa. So if we, wrongly, return "Might have more data" because of a race condition, we just get a small performance overhead because we need to query again, and the next time we will receive "End of data".
"Invoke GetFoo() again after some time by specifying the last processed element as the starting point"
How are you planning to do that? If it's using the earlier-returned IFooEnumerator, then functionally the two options are equivalent. Otherwise, letting the caller destroy the "enumerator" then however-long afterwards call GetFoo() to continue iteration means you're losing your ability to monitor the client's ongoing interest in the query results. It might be that right now you have no need for that, but I think it's poor design to exclude the ability to track state throughout the overall result processing.
It really depends on many things whether the overall system will at all work (not going into details about your actual implementation):
No matter how you twist it, there will be a race condition between checking for "Is there more data" and more data being added to the system. Which means that it's possibly pointless to try to capture the last few data items?
You probably need to limit the number of repeated runs for "is there more data", or you could end up in an endless loop of "new data came in while processing the last lot".
How easy it is to know if data has been updated - if all the updates are "new items" with new ID's that are sequentially higher, you can simply query "Is there data above X", where X is your last ID. But if you are, for example, counting how many items in the data has property Y set to value A, and data may be updated anywhere in the database at the time (e.g. a database of where taxis are at present, that gets updated via GPS every few seconds and has thousands of cars, it may be hard to determine which cars have had updates since last time you read the database).
As to your implementation, in option 2, I'm not sure what you mean by the MightHaveMoreData state - either it has, or it hasn't, right? Repeated polling for more data is a bad design in this case - given that you will never be able to say 100% certain that there hasn't been "new data" provided in the time it took from fetching the last data until it was processed and acted on (displayed, used to buy shares on the stock market, stopped the train or whatever it is that you want to do once you have processed your new data).
Read-write lock could help. Many readers have simultaneous access to data set, and only one writer.
The idea is simple:
-when you need read-only access, reader uses "read-block", which could be shared with other reads and exclusive with writers;
-when you need write access, writer uses write-lock which is exclusive for both readers and writers;
Background
I have a 2-tier web service - just my app server and an RDBMS. I want to move to a pool of identical app servers behind a load balancer. I currently cache a bunch of objects in-process. I hope to move them to a shared Redis.
I have a dozen or so caches of simple, small-sized business objects. For example, I have a set of Foos. Each Foo has a unique FooId and an OwnerId.
One "owner" may own multiple Foos.
In a traditional RDBMS this is just a table with an index on the PK FooId and one on OwnerId. I'm caching this in one process simply:
Dictionary<int,Foo> _cacheFooById;
Dictionary<int,HashSet<int>> _indexFooIdsByOwnerId;
Reads come straight from here, and writes go here and to the RDBMS.
I usually have this invariant:
"For a given group [say by OwnerId], the whole group is in cache or none of it is."
So when I cache miss on a Foo, I pull that Foo and all the owner's other Foos from the RDBMS. Updates make sure to keep the index up to date and respect the invariant. When an owner calls GetMyFoos I never have to worry that some are cached and some aren't.
What I did already
The first/simplest answer seems to be to use plain ol' SET and GET with a composite key and json value:
SET( "ServiceCache:Foo:" + theFoo.Id, JsonSerialize(theFoo));
I later decided I liked:
HSET( "ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo));
That lets me get all the values in one cache as HVALS. It also felt right - I'm literally moving hashtables to Redis, so perhaps my top-level items should be hashes.
This works to first order. If my high-level code is like:
UpdateCache(myFoo);
AddToIndex(myFoo);
That translates into:
HSET ("ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo));
var myFoos = JsonDeserialize( HGET ("ServiceCache:FooIndex", theFoo.OwnerId) );
myFoos.Add(theFoo.OwnerId);
HSET ("ServiceCache:FooIndex", theFoo.OwnerId, JsonSerialize(myFoos));
However, this is broken in two ways.
Two concurrent operations can read/modify/write at the same time. The latter "wins" the final HSET and the former's index update is lost.
Another operation could read the index in between the first and second lines. It would miss a Foo that it should find.
So how do I index properly?
I think I could use a Redis set instead of a json-encoded value for the index.
That would solve part of the problem since the "add-to-index-if-not-already-present" would be atomic.
I also read about using MULTI as a "transaction" but it doesn't seem like it does what I want. Am I right that I can't really MULTI; HGET; {update}; HSET; EXEC since it doesn't even do the HGET before I issue the EXEC?
I also read about using WATCH and MULTI for optimistic concurrency, then retrying on failure. But WATCH only works on top-level keys. So it's back to SET/GET instead of HSET/HGET. And now I need a new index-like-thing to support getting all the values in a given cache.
If I understand it right, I can combine all these things to do the job. Something like:
while(!succeeded)
{
WATCH( "ServiceCache:Foo:" + theFoo.FooId );
WATCH( "ServiceCache:FooIndexByOwner:" + theFoo.OwnerId );
WATCH( "ServiceCache:FooIndexAll" );
MULTI();
SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo));
SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId);
SADD ("ServiceCache:FooIndexAll", theFoo.FooId);
EXEC();
//TODO somehow set succeeded properly
}
Finally I'd have to translate this pseudocode into real code depending how my client library uses WATCH/MULTI/EXEC; it looks like they need some sort of context to hook them together.
All in all this seems like a lot of complexity for what has to be a very common case;
I can't help but think there's a better, smarter, Redis-ish way to do things that I'm just not seeing.
How do I lock properly?
Even if I had no indexes, there's still a (probably rare) race condition.
A: HGET - cache miss
B: HGET - cache miss
A: SELECT
B: SELECT
A: HSET
C: HGET - cache hit
C: UPDATE
C: HSET
B: HSET ** this is stale data that's clobbering C's update.
Note that C could just be a really-fast A.
Again I think WATCH, MULTI, retry would work, but... ick.
I know in some places people use special Redis keys as locks for other objects. Is that a reasonable approach here?
Should those be top-level keys like ServiceCache:FooLocks:{Id} or ServiceCache:Locks:Foo:{Id}?
Or make a separate hash for them - ServiceCache:Locks with subkeys Foo:{Id}, or ServiceCache:Locks:Foo with subkeys {Id} ?
How would I work around abandoned locks, say if a transaction (or a whole server) crashes while "holding" the lock?
For your use case, you don't need to use watch. You simply use a multi + exec block and you'd have eliminated the race condition.
In pseudo code -
MULTI();
SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo));
SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId);
SADD ("ServiceCache:FooIndexAll", theFoo.FooId);
EXEC();
This is sufficient because multi makes the following promise :
"It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction"
You don't need the watch and retry mechanism because you are not reading and writing in the same transaction.
I'm looking for some general
Optimization
Correctness
Extensibility
advice on my current C++ Hierarchical State Machine implementation.
Sample
variable isMicOn = false
variable areSpeakersOn = false
variable stream = false
state recording
{
//override block for state recording
isMicOn = true //here, only isMicOn is true
//end override block for state recording
}
state playback
{
//override block for state playback
areSpeakersOn = true //here, only areSpeakersOn = true
//end override block for state playback
state alsoStreamToRemoteIp
{
//override block for state alsoStreamToRemoteIp
stream = true //here, both areSpeakersOn = true and stream = true
//end override block for state alsoStreamToRemoteIp
}
}
goToState(recording)
goToState(playback)
goToState(playback.alsoStreamToRemoteIp)
Implementation
Currently, the HSM is implemented as a tree structure where each state can have a variable number of states as children.
Each state contains a variable number of "override" blocks (in a std::map) that override base values. At the root state, the state machine has a set of variables (functions, properties...) initialized to some default values. Each time we enter a child state, a list of "overrides" define variable and values that should replace the variables and values of the same name in the parent state. Updated original for clarity.
Referencing variables
At runtime, the current states are stored on a stack.
Every time a variable is referenced, a downwards stack walk is performed looking for the highest override, or in the case of no overrides, the default value.
Switching states
Each time a single state frame is switched to, the state is pushed onto a stack.
Each time a state is switched to, I trace a tree descension that takes me from the current state to the root state. Then I do a tree descension from the target state to the root state until I see the current trace matches the previous trace. I declare an intersection at where those 2 traces meet. Then, to switch to the target state, I descend from the source, popping state frames from the stack until I reach the intersection point. Then I ascend to the target node and push state frames onto the stack.
So for the code sample above
Execution trace for state switch
Source state = recording
Target State = alsoStreamToRemoteIp
descension from source = recording->root (trace = [root])
descension from target = alsoStreamToRemoteIp->playback->root (trace = [playback, root])
Intersects at root.
To switch from recording to alsoStreamToRemoteIp,
Pop "recording" from the stack (and call its exit function... not defined here).
Push "playback" onto the stack (and call the enter function).
Push "alsoStreamToRemoteIp" onto the stack (and call the the enter function).
Two things:
1: For most cases just represent the state of your program as a Model, and interact with it directly or through the MVC pattern.
2: If you really need a FSM, i.e. you want to randomly make a bunch of actions to your model, only some of which are allowed at certain times. Then....
Still keep the state of your program in a Model (or multiple Models depending on decomposition and complexity) and represent states and transitions like.
class State:
def __init__(self):
self.neighbors = {}
Where neighbors contains a dictionary of of {Action: State}, so that you can do something like
someAction.execute() # Actions manipulate the model (use classes or lambdas)
currentState = currentState.neighbors[someAction]
Or even cooler, have an infinite loop randomly selecting an action from the neighbors, executing it, and moving state indefinitely. It's a great way to test your program.
I'm not sure I follow all the details here. However, it seems that you are describing an FSM (finite state machine) implementation where you have multiple state machines. Sometimes, when a particular event (E1) occurs in a particular state (S1) of FSM F1, you need to enter a new FSM (call it F2) to simplify the processing overall).
If that's the case, then when E1 occurs in S1, you need to invoke an action routine that takes over the event reading and implements the F2 FSM. When invoked, it starts processing in the start state of F2, and handles the relevant sub-events. When it reaches its end state, the interpreter for F2 finishes. It might return some information to the F1 action routine that was suspended while F2 ran, and the next state in F1 may be affected by that.
The rest of your description - stuff like 'override blocks' - won't make much sense to people without access to your implementation.