Redshift stv_wlm_query_state.state: QUEUED vs QUEUEDWAITING - amazon-web-services

In redshift stv_wlm_query_state system table, what are the differences between QUEUED state and QUEUEDWAITING state?

I've not seen an exact and authoritative set of definitions for queue states published but I have a general understanding that has been useful to me. When a query is submitted it needs to be processed through many steps like compiling, running and returning data. These are all reflected in queue states but there is also time before and between these steps as the query progresses. QUEUED just means that the query is in the queue process but not in another defined state.
Since parallel execution of queries is limited by the WLM and the number of slots available there is a defined state for queries that are waiting on other queries to finish before they can be executed. This specific waiting-for-an-execution-slot state is QUEUEDWAITING. This is generally the most common place for significant waiting to occur and is directly optimizable through the WLM (but possibly not fixed). Delays caused by a flurry of very complex queries needing to be compiled and optimized by the leader would not create QUEUEDWAITING states but these could just show up as QUEUED state.
This is my working understanding based on experience. If someone posts an authoritative set of definitions for queue states I'll be as interested as you are.

Related

Event Sourcing/CQRS doubts about aggregates, atomicity, concurrency and eventual consistency

I'm studying event sourcing and command/query segregation and I have a few doubts that I hope someone with more experience will easily answer:
A) should a command handler work with more than one aggregate? (a.k.a. should they coordinate things between several aggregates?)
B) If my command handler generates more than one event to store, how do you guys push all those events atomically to the event store? (how can I garantee no other command handler will "interleave" events in between?)
C) In many articles I read people suggest using optimistic locking to write the new events generated, but in my use case I will have around 100 requests / second. This makes me think that a lot of requests will just fail at huge rates (a lot of ConcurrencyExceptions), how you guys deal with this?
D) How to deal with the fact that the command handler can crash after storing the events in the event store but before publishing them to the event bus? (how to eventually push those "confirmed" events back to the event bus?)
E) How you guys deal with the eventual consistency in the projections? you just live with it? or in some cases people lock things there too? (waiting for an update for example)
I made a sequence diagram to better ilustrate all those questions
(and sorry for the bad english)
If my command handler generates more than one event to store, how do you guys push all those events atomically to the event store?
Most reasonable event store implementations will allow you to batch multiple events into the same transaction.
In many articles I read people suggest using optimistic locking to write the new events generated, but in my use case I will have around 100 requests / second.
If you have lots of parallel threads trying to maintain a complex invariant, something has gone badly wrong.
For "events" that aren't expected to establish or maintain any invariant, then you are just writing things to the end of a stream. In other words, you are probably not trying to write an event into a specific position in the stream. So you can probably use batching to reduce the number of conflicting writes, and a simple retry mechanism. In effect, you are using the same sort of "fan-in" patterns that appear when you have concurrent writers inserting into a queue.
For the cases where you are establishing/maintaining an invariant, you don't normally have many concurrent writers. Instead, specific writers have authority to write events (think "sharding"); the concurrency controls there are primarily to avoid making a mess in abnormal conditions.
How to deal with the fact that the command handler can crash after storing the events in the event store but before publishing them to the event bus?
Use pull, rather than push, as the primary subscription mechanism. Make sure that subscribers can handle duplicate messages safely (aka "idempotent"). Don't use a message subscription that can re-order events when you need events strictly ordered.
How you guys deal with the eventual consistency in the projections? you just live with it?
Pretty much. Views and reports have metadata information in them to let you know at what fixed point in "time" the report was accurate.
Unless you lock out all writers while a report is being consumed, there's a potential for any data being out of date, regardless of whether you are using events vs some other data model, regardless of whether you are using a single data model or several.
It's all part of the tradeoff; we accept that there will be a larger window between report time and current time in exchange for lower response latency, an "immutable" event history, etc.
should a command handler work with more than one aggregate?
Probably not - which isn't the same thing as always never.
Usual framing goes something like this: aggregate isn't a domain modeling pattern, like entity. It's a lifecycle pattern, used to make sure that all of the changes we make at one time are consistent.
In the case where you find that you want a command handler to modify multiple domain entities at the same time, and those entities belong to different aggregates, then have you really chosen the correct aggregate boundaries?
What you can do sometimes is have a single command handler that manages multiple transactions, updating a different aggregate in each. But it might be easier, in the long run, to have two different command handlers that each receive a copy of the command and decide what to do, independently.

[Apache Beam/ Dataflow]: Maintain states outside of the Dataflow Job

We have a problem where we need to maintain states for a long period of time (weeks) across windows. Widening a window is not feasible and in order to guarantee that state is maintained throughout failure scenarios (topology restarts).
Example use case:
I want to see if an callback for a given impression have been observed within past 5 weeks period as I am processing the events. Based on the state check, I may or may not process this event as part of the aggregation. The most important requirement here is that I need to keep the state across multiple windows and keeping a long window and triggering is not feasible as we are interested in lookback of up to 5 weeks.
I am leaning toward using external state rather than using stateful DoFns which is keyed per window. For example, using KV store to lookup the state from the Dataflow job as needed. There is some overhead involved in this approach as we need to maintain external storages and this state need to be atomic with respect to different life cycle scenarios. For example if a state is updated but job dies, how do we rollback the state?
What is the best way to approach this type of problems?
Is there a publication such as a white paper that discusses this
type of problems?

How do I multiplex many asynchronous state machines over a fixed number of threads with boost::statechart?

Suppose I have many asynchronous state machines defined with boost::statechart. The clearly documented mechanism for running multiple asynchronous state machines is to fix one or more of them to a thread. However, for my purpose I need to run many, many asynchronous state machines, and one per thread will not do. Moreover, the amount of work done by any given state machine is unpredictable, so assigning state machines to fixed threads will lead to imbalance.
Instead, I'd like to have a thread pool where an idle thread can pick up some amount of work off of a queue. Some care needs to be taken here so that events to a given state machine are delivered in order. Presumably the place to start would be something involving implementing the Scheduler and perhaps the FifoWorker concepts to do what I want as an alternative to the fifo_scheduler and fifo_worker classes, respectively. However, I wonder if this problem has already been solved by someone else, or if I'm just asking the wrong question.
Answering my own question, now that I've had some time to think about it. This is pretty simple:
Every state machine gets its own fifo_scheduler
When we want the state machine to start running, a function is posted to the thread pool that:
Checks scheduler.terminated() and stops if so.
Runs scheduler(n), where n is some implementation-dependent value. We need that to prevent starvation.
Posts itself back to the thread pool.
This also ensures that events are delivered in order without resorting to other means.
This isn't the greatest answer, since the service function will occupy a space in the queue and be called even when there's no work to do.

Amazon SQS message multi-delivery

I understand that to bring vast scalability and reliability, SQS does extensive parallelization of resources. It uses redundant servers for even small queues and even the messages posted to the queues are stored redundantly as multiple copies. These are the factors which prevent it from exactly-once-delivery like in RabbitMQ. I have seen even deleted messages being delivered.
The implications for the developers is that they need to be prepared for multiple delivery of messages. Amazon claims it not to be a problem, but it it is, then the developer must use some synchronization construct like a database-transaction-lock or dynamo-db conditional write. both of these reduce scalability.
Question is,
In light of the duplicate delivery problem, how the message-invisible-period feature holds? The message is not guaranteed to be invisible. If the developer has to make own arrangements for synchronization, what benefit is of the invisibility-period. I have seen messages re-delivered even when they were supposed to be invisible.
Edit
here i include some references
What is a good practice to achieve the "Exactly-once delivery" behavior with Amazon SQS?
http://aws.amazon.com/sqs/faqs/#How_many_times_will_I_receive_each_message
http://aws.amazon.com/sqs/faqs/#How_does_Amazon_SQS_allow_multiple_readers_to_access_the_same_message_queue_without_losing_messages_or_processing_them_many_times
http://aws.amazon.com/sqs/faqs/#Can_a_deleted_message_be_received_again
Message invisibility solves a different problem to guaranteeing one and only one delivery. Consider a long running operation on an item in the queue. If the processor craps out during the operation, you don't want to delete the message, you want it to reappear and be handled again by a different processor.
So the pattern is...
Write (push) item into queue
View (peek) item in queue
Mark item invisible
Execute process on item
Write results
Delete (pop) item from queue
So whether you get duplicate delivery or not, you still need to ensure that you process the item in the queue. If you delete it on pulling it off the queue, and then your server dies, you may lose that message forever. It enables aggressive scaling through the use of spot instances - and guarantees (using the above pattern), that you won't lose a message.
But - it doesn't guarantee once and only once delivery. But I don't think it's designed for that problem. I also don't think it's an insurmountable problem. In our case (and I can see why I've never noticed the issues before) - we're writing results to S3. It's no big deal if it overwrites the same file with the same data. Of course if it's a debit transaction going to a bank a/c, you'd probably want some sort of correlation ID... and most systems already have those in there. So if you get a duplicate correlation value, you throw an exception and move on.
Good question. Highlighted something for me.

taking a snapshot of complex mutable structure in concurrent environment

Given: a complex structure of various nested collections, with refs scattered in different levels.
Need: A way to take a snapshot of such a structure, while allowing writes to continue to happen in other threads.
So one the "reader" thread needs to read whole complex state in a single long transaction. The "writer" thread meanwhile makes modifications in multiple short transactions. As far as I understand, in such a case STM engine utilizes the refs history.
Here we have some interesting results. E.g., reader reaches some ref in 10 secs after beginning of transaction. Writer modifies this ref each 1 sec. It results in 10 values of ref's history. If it exceeds the ref's :max-history limit, the reader transaction will be run forever. If it exceeds :min-history, transaction may be rerun several times.
But really the reader needs just a single value of ref (the 1st one) and the writer needs just the recent one. All intermediate values in history list are useless. Is there a way to avoid such history overuse?
Thanks.
To me it's a bit of a "design smell" to have a large structure with lots of nested refs. You are effectively emulating a mutable object graph, which is a bad idea if you believe Rich Hickey's take on concurrency.
Some various thoughts to try out:
The idiomatic way to solve this problem in Clojure would be to put the state in a single top-level ref, with everything inside it being immutable. Then the reader can take a snapshot of the entire concurrent state for free (without even needing a transaction). Might be difficult to refactor to this from where you currently are, but I'd say it is best practice.
If you only want the reader to get a snaphot of the top level ref, you can just deref it directly outside of a transaction. Just be aware that the refs inside may continue to get mutated, so whether this is useful or not depends on the consistency requirements you have for the reader.
You can do everything within a (dosync...) transaction as normal for both readers and writer. You may get contention and transaction retries, but it may not be an issue.
You can create a "snapshot" function that quickly traverses the graph and dereferences all the refs within a transaction, returning the result with the refs stripped out (or replaced by new cloned refs). The reader calls snapshot once, then continues to do the rest of it's work after the snapshot is completed.
You could take a snapshot immediately each time after the writer finishes, and store it separately in an atom. Readers can use this directly (i.e. only the writer thread accesses the live data graph directly)
The general answer to your question is that you need two things:
A flag to indicate that the system is in "snapshot write" mode
A queue to hold all transactions that occur while the system is in snapshot mode
As far as what to do if the queue is overflows because the snapshot process isn't fast enough, well, there isn't much you can do about that except either optimize that process, or increase the size of your queue - it's going to be a balance that you'll have to strike depending on the needs of you app. It's a delicate balance, and is going to take some pretty extensive testing, depending on how complex your system is.
But you're on the right track. If you basically put the system in "snapshot write mode", then your reader/writer methods should automatically change where they are reading/writing from, so that the thread that is making changes gets all the "current values" and the thread reading the snapshot state is reading all the "snapshot values". You can split these up into separate methods - the snapshot reader will use the "snapshot value" methods, and all other threads will read the "current value" methods.
When the snapshot reader is done with its work, it needs to clear the snapshot state.
If a thread tries to read the "snapshot values" when no "snapshot state" is currently set, they should simply respond with the "current values" instead. No biggie.
Systems that allow snapshots of file systems to be taken for backup purposes, while not preventing new data from being written, follow a similar scheme.
Finally, unless you need to keep a record of all changes to the system (i.e. for an audit trail), then the queue of transactions actually doesn't need to be a queue of changes to be applied - it just needs to store the latest value of whatever thing you're changing in the system. When the "snapshot state" is cleared, you simply write all those non-committed values to the system, and call it done. The thing you might want to consider is making a log of those changes yet to be made, in case you need to recover from a crash, and have those changes still applied. The log file will give you a record of what happened, and can let you do this recovery. That's an oversimplification of the recovery process, but that's not really what your question is about, so I'll stop there.
What you are after is the state-of-the-art in high-performance concurrency. You should look at the work of Nathan Bronson, and his lab's collaborations with Aleksandar Prokopec, Phil Bagwell and the Scala team.
Binary Tree:
http://ppl.stanford.edu/papers/ppopp207-bronson.pdf
https://github.com/nbronson/snaptree/
Tree-of-arrays -based Hash Map
http://lampwww.epfl.ch/~prokopec/ctries-snapshot.pdf
However, a quick look at the implementations above should convince you this is not "roll-your-own" territory. I'd try to adapt an off-the-shelf concurrent data structure to your needs if possible. Everything I've linked to is freely available on the JVM, but its not native Clojure as such.