ZooKeeper linearizability regarding watched notification

ZooKeeper linearizability regarding watched notification - concurrency

Question:
If client A is watching znode /a and is trying to read /b; and client B deletes /a before updating /b. Client A will stop reading /b if it gets the notification that /a is gone.
Is it possible for the following to occur in order?
client A reads /a and sets watch
client A sees /a exists
client B deletes /a
client B receives response that /a is gone
client A reads /b
client A gets outdated information
client A gets notified /a is gone, but it's too late

I assume there is a 3.5 Client B updates /b, also I don't think 4. is relevant.
The zookeeper guarantees are here,
Watches are ordered with respect to other events, other watches, and
asynchronous replies. The ZooKeeper client libraries ensures that
everything is dispatched in order.
If you use java, and only use the async methods, then in the zookeeper event thread, you will get a watchEvent that /a was deleted before the async read of /b.
There is a complication though if you use the synchronous zookeeper api, since then you are introducing more threads in your code which can violate the ordering guarantees. See the notes in the java bindings here, especially this part,
Synchronous calls may not return in the correct order. For example,
assume a client does the following processing: issues an asynchronous
read of node /a with watch set to true, and then in the completion
callback of the read it does a synchronous read of /a. (Maybe not good
practice, but not illegal either, and it makes for a simple example.)
Note that if there is a change to /a between the asynchronous read and
the synchronous read, the client library will receive the watch event
saying /a changed before the response for the synchronous read, but
because the completion callback is blocking the event queue, the
synchronous read will return with the new value of /a before the watch
event is processed.
So if you are using the synchronous api A may see the watch event of /a after the read of /b.

In zookeeper only write obey linearizability not read.
Because of that event when using Async API, its possible in step 5 for a client to read a stale value of A after some client got a response that A was succesfully deleted.
This is because your read request might reach a Zookeeper node that think its still the master node and this node will not try to talk to other replica before responding to the read request so it would not discover it's no longer master fast enough.
If you want your read to return the latest value you can force linearizability by always calling Sync() before doing a read.
This would guarantee that your read will see the effect of all write that had happen in wall clock time, before your call to Sync().

Related

DDD - Concurrency and Command retrying with side-effects

I am developing an event-sourced Electric Vehicle Charging Station Management System, which is connected to several Charging Stations. In this domain, I've come up with an aggregate for the Charging Station, which includes the internal state of the Charging Station(whether it is network-connected, if a car is charging using the station's connectors).
The station notifies me about its state through messages defined in a standardized protocol:
Heartbeat: whether the station is still "alive"
StatusNotification: whether the station has encountered an error(under voltage), or if everything is correct
And my server can send commands to this station:
RemoteStartTransaction: tells the station to unlock and reserve one of its connectors, for a car to charge using the connector.
I've developed an Aggregate for this Charging Station. It contains the internal entities of its connector, whether it's charging or not, if it has a problem in the power system, ...
And the Aggregate, which its memory representation resides in the server that I control, not in the Charging Station itself, has a StationClient service, which is responsible for sending these commands to the physical Charging Station(pseudocode):
class StationAggregate {
stationClient: StationClient
URL: string
connector: Connector[]
unlock(connectorId) {
if this.connectors.find(connectorId).isAvailableToBeUnlocked() {
return ErrorConnectorNotAvailable
}
error = this.stationClient.sendRemoteStartTransaction(this.URL, connectorId)
if error {
return ErrorStationRejectedUnlock
}
this.applyEvents([
StationUnlockedEvent(connectorId, now())
])
return Ok
}
receiveHeartbeat(timestamp) {
this.applyEvents([
StationSentHeartbeat(timestamp)
])
return Ok
}
}
I am using a optimistic concurrency, which means that, I load the Aggregate from a list of events, and I store the current version of the Aggregate in its memory representation: StationAggregate in version #2032, when a command is successfully processed and event(s) applied, it would the in version #2033, for example. In that way, I can put a unique constraint on the (StationID, Version) tuple on my persistence layer, and guarantee that only one event is persisted.
If by any chance, occurs a receival of a Heartbeat message, and the receival of a Unlock command. In both threads, they would load the StationAggregate and would be both in version X, in the case of the Heartbeat receival, there would be no side-effects, but in the case of the Unlock command, there would be a side-effect that tells the physical Charging Station to be unlocked. However as I'm using optimistic concurrency, that StationUnlocked event could be rejected from the persistence layer. I don't know how I could handle that, as I can't retry the command, because it its inherently not idempotent(as the physical Station would reject the second request)
I don't know if I'm modelling something wrong, or if it's really a hard domain to model.

I am not sure I fully understand the problem, but the idea of optimistic concurrency is to prevent writes in case of a race condition. Versions are used to ensure that your write operation has the version that is +1 from the version you've got from the database before executing the command.
So, in case there's a parallel write that won and you got the wrong version exception back from the event store, you retry the command execution entirely, meaning you read the stream again and by doing so you get the latest state with the new version. Then, you give the command to the aggregate, which decides if it makes sense to perform the operation or not.
The issue is not particularly related to Event Sourcing, it is just as relevant for any persistence and it is resolved in the same way.
Event Sourcing could bring you additional benefits since you know what happened. Imagine that by accident you got the Unlock command twice. When you got the "wrong version" back from the store, you can read the last event and decide if the command has already been executed. It can be done logically (there's no need to unlock if it's already unlocked, by the same customer), technically (put the command id to the event metadata and compare), or both ways.
When handling duplicate commands, it makes sense to ensure a decent level of idempotence of the command handling, ignore the duplicate and return OK instead of failing to the user's face.
Another observation that I can deduce from the very limited amount of information about the domain, is that heartbeats are telemetry and locking and unlocking are business. I don't think it makes a lot of sense to combine those two distinctly different things in one domain object.
Update, following the discussion in comments:
What you got with sending the command to the station at the same time as producing the event, is the variation of two-phase commit. Since it's not executed in a transaction, any of the two operations could fail and lead the system to an inconsistent state. You either don't know if the station got the command to unlock itself if the command failed to send, or you don't know that it's unlocked if the event persistence failed. You only got as far as the second operation, but the first case could happen too.
There are quite a few ways to solve it.
First, solving it entirely technical. With MassTransit, it's quite easy to fix using the Outbox. It will not send any outgoing messages until the consumer of the original message is fully completed its work. Therefore, if the consumer of the Unlock command fails to persist the event, the command will not be sent. Then, the retry filter would engage and the whole operation would be executed again and you already get out of the race condition, so the operation would be properly completed.
But it won't solve the issue when your command to the physical station fails to send (I reckon it is an edge case).
This issue can also be easily solved and here Event Sourcing is helpful. You'd need to convert sending the command to the station from the original (user-driven) command consumer to the subscriber. You subscribe to the event stream of StationUnlocked event and let the subscriber send commands to the station. With that, you would only send commands to the station if the event was persisted and you can retry sending the command as many times as you'd need.
Finally, you can solve it in a more meaningful way and change the semantics. I already mentioned that heartbeats are telemetry messages. I could expect the station also to respond to lock and unlock commands, telling you if it actually did what you asked.
You can use the station telemetry to create a representation of the physical station, which is not a part of the aggregate. In fact, it's more like an ACL to the physical world, represented as a read model.
When you have such a mirror of the physical station on your side, when you execute the Unlock command in your domain, you can engage a domain server to consult with the current station state and make a decision. If you find out that the station is already unlocked and the session id matches (yes, I remember our previous discussion :)) - you return OK and safely ignore the command. If it's locked - you proceed. If it's unlocked and the session id doesn't match - it's obviously an error and you need to do something else.
In this last option, you would clearly separate telemetry processing from the business so you won't have heartbeats impact your domain model, so you really won't have the versioning issue. You also would always have a place to look at to understand what is the current state of the physical station.

How to implement long running gRPC async streaming data updates in C++ server

I'm creating an async gRPC server in C++. One of the methods streams data from the server to clients - it's used to send data updates to clients. The frequency of the data updates isn't predictable. They could be nearly continuous or as infrequent as once per hour. The model used in the gRPC example with the "CallData" class and the CREATE/PROCESS/FINISH states doesn't seem like it would work very well for that. I've seen an example that shows how to create a 'polling' loop that sleeps for some time and then wakes up to check for new data, but that doesn't seem very efficient.
Is there another way to do this? If I use the "CallData" method can it block in the 'PROCESS' state until there's data (which probably wouldn't be my first choice)? Or better, can I structure my code so I can notify a gRPC handler when data is available?
Any ideas or examples would be appreciated.

In a server-side streaming example, you probably need more states, because you need to track whether there is currently a write already in progress. I would add two states, one called WRITE_PENDING that is used when a write is in progress, and another called WRITABLE that is used when a new message can be sent immediately. When a new message is produced, if you are in state WRITABLE, you can send immediately and go into state WRITE_PENDING, but if you are in state WRITE_PENDING, then the newly produced message needs to go into a queue to be sent after the current write finishes. When a write finishes, if the queue is non-empty, you can grab the next message from the queue and immediately start a write for it; otherwise, you can just go into state WRITABLE and wait for another message to be produced.
There should be no need to block here, and you probably don't want to do that anyway, because it would tie up a thread that should otherwise be polling the completion queue. If all of your threads wind up blocked that way, you will be blind to new events (such as new calls coming in).
An alternative here would be to use the C++ sync API, which is much easier to use. In that case, you can simply write straight-line blocking code. But the cost is that it creates one thread on the server for each in-progress call, so it may not be feasible, depending on the amount of traffic you're handling.
I hope this information is helpful!

closesocket() not completing pending operations of IOCP

I am currently working on a server application in C++. My main inspirations are these examples:
Windows SDK IOCP Excample
The I/O Completion Port IPv4/IPv6 Server Program Example
My app is strongly similar to these (socketobj, packageobj, ...).
In general, my app is running without issues. The only things which still causes me troubles are half open connections.
My strategy for this is: I check every connected client in a time period and count an "idle counter" up. If one completion occurs, I reset this timer. If the Idle counter goes too high, I set a boolean to prevent other threads from posting operations, and then call closesocket().
My assumption was that now the socket is closed, the pending operations will complete (maybe not instantly but after a time). This is also the behavior the MSDN documentation is describing (hints, second paragraph). I need this because only after all operations are completed can I free the resources.
Long story short: this is not the case for me. I did some tests with my testclient app and some cout and breakpoint debugging, and discovered that pending operations for closed sockets are not completing (even after waiting 10 min). I also already tried with a shutdown() call before the closesocket(), and both returned no error.
What am I doing wrong? Does this happen to anyone else? Is the MSDN documentation wrong? What are the alternatives?
I am currently thinking of the "linger" functionality, or to cancel every operation explicitly with the CancelIoEx() function
Edit: (thank you for your responses)
Yesterday evening I added a chained list for every sockedobj to hold the per io obj of the pending operations. With this I tried the CancelIOEx() function. The function returned 0 and GetLastError() returned ERROR_NOT_FOUND for most of the operations.
Is it then safe to just free the per Io Obj in this case?
I also discovered, that this is happening more often, when I run my server app and the client app on the same machine. It happens from time to time, that the server is then not able to complete write operations. I thought that this is happening because the client side receive buffer gets to full. (The client side does not stop to receive data!).
Code snipped follows as soon as possible.

The 'linger' setting can used to reset the connection, but that way you will (a) lose data and (b) deliver a reset to the peer, which may terrify it.
If you're thinking of a positive linger timeout, it doesn't really help.
Shutdown for read should terminate read operations, but shutdown for write only gets queued after pending writes so it doesn't help at all.
If pending writes are the problem, and not completing, they will have to be cancelled.

hiredis , How to check if more data is available to read

I am trying to write connection pool using hiredis.
Problem I am facing is , if user fires a command and didn't read the response from the connection, I should be clearing the response from that connection before putting to connection pool.
Is there any way to check:
Is there more data to read? So I can do redisGetReply , till all data get cleared.
Or is there a way to clear all pending read on connection object ?

Question is not clear, as it fails to state whether you are using sync or async operations.
You mention redisGetReply, I would assume use of sync operations. Sync calls would be blocking calls. Response to commands would be available in the same call. A scenario where you might want to check if all data is read is when context is shared between threads and you check for data before returning connection to pool.
Yes redisGetReply can be used to check if there is more data to read.
For async calls use redisAsyncHandleRead to check if there is data to be read.
Internally both redisGetReply and redisAsyncHandleRead make call to redisBufferRead.
For sync calls use redisFree to clear context.
For Aysnc calls use redisAsyncFree to clear context.

Overlapped message named pipe, ERROR_MORE_DATA and CancelIoEx

I am using $SUB for the first time and have come across this problem. Both, client and server use overlapped operations and here is the specific situation I have a problem with.
Client
C1. Connects to the server.
C2. Sends the message bigger than a pipe buffer and buffer passed to overlapped read operation in the server.
C3. Successfully cancels the send operation.
Server
S1. Creates and waits for the client.
S2. When the client is connected, it reads the message.
S21. Because message doesn't fit into the buffer(ERROR_MORE_DATA), it is read part by part.
It seems to me that there is no way to tell when is the whole message, as an isolated unit, canceled. In particular, if client cancels the send operation, server does not receive the whole message, just a part of it, and consequent read operation returns with ERROR_IO_PENDING (in my case), which means there is no data to be read and read operation has been queued. I would expect to have some kind of means telling the reader that the message has been canceled, so that reader can act upon it.
However, relevant documentation is scatter over MSDN, so I may as well be missing something. I would really appreciate if anyone can shed some light on it. Thanks.

You are correct, there is no way to tell.
If you cancel the Writefile partway through, only part of the message will be written, so only that part will be read by the server. There is no "bookkeeping" information sent about how large the message was going to be before you cancelled it - what is sent is just the raw data.
So the answer is: Don't cancel the IO, just wait for it to succeed.
If you do need to cancel IO partway through, you should probably cut the connection and start again from the beginning, just as you would for a network outage.
(You could check your OVERLAPPED structure to find out how much was actually written, and carry on from there, but if you wanted to do that you would probably just not cancel the IO in the first place.)
Why did you want to cancel the IO anyway? What set of circumstances triggers this requirement?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js