Couchbase read / write concurrency - concurrency

I have a question regarding how does Couchbase internally handle concurrency.
I tried researching in their documentation and all I found was that it depends on which locking mechanism you use in your application, the two main being :
Optimistic locking
Pessimistic locking
However, both of the above are related to how we want our strategy to be for saving data , meaning if we prefer to lock it or not.
In our case, IF we are not using either of those locking in our application, how would couchbase serve the document in the scenario below :
If application A writes a document A
At the very same instance application B tries to read Document A
My question is will Application B have to queue up to read the document, or by default it will get served the older version (all of this is not going through sync gateway and we are using .Net DLL directly for writing and reading).
Couchbase version 4.5.0

If you are using the Couchbase SDK and connecting directly to the Data Service, Couchbase is strongly consistent. If application A writes the document and immediately after application B reads it, application B will get that version. The consistency comes from how Couchbase distributes the data and how the client SDK accesses it. Couchbase distributes each object to one of 1024 active shards (Couchbase calls them vBuckets). There are replicas, but I will not get into that here. When the SDK goes to read/write objects directly, it take the object ID you give, passed it into a consistent CRC32 hash. The output of that hash is a number between 0-1023, the vBucket number. The SDK then looks into the cluster map (a JSON document distributed by the cluster) and finds where in the cluster that vBucket lives. The SDK then goes and talks directly to that node and vBucket. That is how application A can write an object and then microseconds later application B reads it. They are both reading and writing to the same place. Couchbase does not scale reads from replicas. Replicas are only for HA.

Because, as Kirk mentioned in his reply, Couchbase is consistent, both the read and write requests in your scenario will go to the same node and access the same object in the server's memory. However, concepts like "at the same time" get fuzzy when talking about distributed systems with network latency and various IO queues involved. Ultimately, the order of execution of the two "simultaneous" requests will depend on the order that the server receives them, but there is no deterministic way to predict what it will be. There are too many variables on the way; what if the CLR of one of the client decides to do garbage collection just then, delaying the request, or one of the client machines experiences momentary network lag, etc. This is one of the reasons that the documentation recommends using explicit locking for concurrent writes, to enforce predictable behavior in the face of unpredictable request ordering.
In your scenario though, there is simply no way to know in which order the write and read will occur, because "at the same time" is not an exact concept. One possible solution in your case might be to use explicit versioning for the data. This can be a timestamp or some sort of revision number that's incremented every time the document is changed. Of course, using a timestamp runs into a similar problem as above, because it's recorded according to the clock of the machine application A runs on, which is most likely different from the clock where application B runs.

Related

Real-time duplication of data among EC2 instances located in different regions

I'm new to AWS and back-end architecture in general. My current configuration is an EC2 instance (south-east region Singapore) running a Twisted real-time server for a real-time chat app.
Currently, in my implementation, whenever a sender sends a message to the server, it is stored in a python dictionary on the server if the receiver is not online. So basically it is storing this message in the instance's RAM. Now, I want to make the app available worldwide, so I'll be running it on instances of different regions. So my question is, how am I supposed to duplicate/replicate this dictionary stored in RAM of one instance to all the other instance, so it is readily available in all regions? (The reason of storing the messages in RAM and not in a database is the nature of the app. The app involves a large volume of messages sent in bursts, which requires it to be considerably faster than speeds offered by a persistent DB store's I/O read-writes.) My aim is to make the app available globally, and having real-time performance.
(Kindly don't flag this question as an "opinion-based" question and close it. I'm new to server side architecture and I really need someone to at least just point me in the right direction. And I don't think I'll be able to find help on this anywhere other than StackOverflow.)
Here's a few things I would think of if I had to build it myself (I've implemented most of these pointers in our own project and it took me quite a while).
If you really really need all servers to be in sync you'll need a consensus protocol. If you do. Don't built this yourself. It's going to take a lot of time and errors.
If you can, partition your chat data into chatrooms and have only a few servers handle one chatroom.
I've used msgpack to encode my data. It's faster and smaller than json.
You'll benefit a lot of compressing your data before you send it over the wire. Have a look at something like zlib or lz4
Even though the size of compressed msgpack is almost the same of that compressed json. I'd choose msgpack because it's faster. It's easier to parse because it's length prefixed encoded.
I would try to send messages together. Batch up all messages every x ms. In my project I chose 100ms batching up messages will save you a lot of bandwidth since your compression algorithm can remove more duplication.
You'll have to handle connection timeouts. Only regard a message as sent and done when you get a reply back (you'll have to design/choose your protocol to handle that)
Think of what is acceptable, how much data you're willing to loose when something crashes or otherwise fails. If you're not willing to loose data you'll have to implement something that stores data to disk.
I've had the problem that writes to database we use (Google Cloud Datastore) take a long time as well. Like somewhere between 100ms and 900ms depending on how much I store. What I did was only store this data every x seconds and set flags on objects that need to be saved next run. Of course you can only do this if you're willing to loose some data when your program crashes.
You'll need something to keep track of what servers are running and which server is responsible for which piece of data
Set up something that checks whether your connection is alive. For example send echoRequests and echos every x time. The sooner you detect a faillure the better. Note however if your reactor is blocked by some cpu intensive task it will not send your echo in time.
If you're not in control of how much data comes in you'll have to slow down or penalize connections that would otherwise take up all of your server time.
EDIT: I only now see that you're looking into redis. As far as I know it's a good queueing system. Use that if you can. Implementing the stuff above would take a lot of time to get it right.

How to restrict the resources within one JVM

I'm trying with WSO2 products, and I'm thinking about a scenario where bad code could take up all the CPU time (e.g. dead loop or so). I did try it with WSO2 AS with 2 tenants, A and B. And A's bad code does affect B and B's app will have a very long reponse delay or even stuck. Is there a way to restrict the CPU usage of a tenant? Thanks!
At the moment, you will have to setup your environment in what is known as private jet mode, where each tenant gets its own JVM, if you need total isolation.
In a shared environment, we have stuck thread detection which will ensure that critical threads will not run for more than a specified time period. We have plans for CPU usage limiting on per tenant basis. This would be available in a future release.
My suggestion would be to not run two tenants in one application server. Run two separate processes on the same machine. Better yet, run two separate processes in separate OS-level containers (like a jail or an lxc container). Or separate virtual machines if you can't use containers.
Operating systems give you tools for controlling CPU use - rlimit and nice for processes, and implementation-specific facilities for containers and VMs. Because they're implemented in the OS (or virtual machine manager), they are capable of doing this job correctly and reliably. There's no way an application server can do it anywhere near as well.
In any case, having separate applications share an application server and JVM is a terrible idea which should have been put to death in the '90s. There's just no need for it, and it introduces so many potential headaches.

What is the modern programming standard for synchronizing data between a web service and a client?

The question is a little general, so to help narrow the focus, I'll share my current setup that is motivating this question. I have a LAMP web service running a RESTful API. We have two client implementations: one browser-based javascript client (local storage store) and one iOS-based client (core data store). Obviously these two clients store data very differently, but the data itself needs to be kept in two-way sync with the remote server as often as possible.
Currently, our "sync" process is a little dumb (as in, non-smart). Conceptually, it looks like:
Client periodically asks the server for ALL of the most-recent data.
Server sends down the remote data, which overwrites the current set of local data in the client's store.
Any local creates/updates/deletes after this point are treated as gold, and immediately sent to the server.
The data itself is stored relationally, and updated occasionally by client users. The clients in my specific case don't care too much about the relationships themselves (which is why we can get away with local storage in the browser client for now).
Obviously this isn't true synchronization. I want to move to a system where, conceptually, a "diff" of the most recent changes are sent to the server periodically, and the server sends back a "diff" of the most recent changes it knows about. It seems very difficult to get to this point, but maybe I just don't understand the problem very well.
REST feels like a good start, but REST only talks about the way two data stores talk to each other, not how the data itself is synchronized between them. (This sync process is left up to the implementer of each store.) What is the best way to implement this process? Is there a modern set of programming design patterns that apply to inform a specific solution to this problem? I'm mostly interested in a general (technology agnostic) approach if possible... but specific frameworks would be useful to look at too, if they exist.
Multi-master replication is always (and will always be) difficult and bespoke, because how conflicts are handled will be specific to your application.
IMO A more robust approach is to use Master-slave replication, with your web service as the master and the clients as slaves. To keep the clients in sync, use an archived atom feed of the changes (see event sourcing) as per RFC5005. This is the closest you'll get to a modern standard for this type of replication and it's RESTful.
When the clients are online, they do not update their replica directly, instead they send commands to the server and have their replica updated via the atom feed.
When the clients are offline things get difficult. Your clients will need to have a model of how your web service behaves. It will need to have an offline copy of your replica, which should be copied on write from the online replica (the online replica is the one that is updated by the atom feed). When the client executes commands that modify the data, it should store the command (for later replay against the web service), the expected result (for verification during replay) and update the offline replica.
When the client goes back online, it should replay the commands, compare the result with the expected result and notify the client of any variances. How these variances are handled will vary based on your application. The offline replica can then be discarded.
CouchDB replication works over HTTP and does what you are looking to do. Once databases are synced on either end it will send diffs for adds/updates/deletes.
Couch can do this with other Couch machines or with a mobile framework like TouchDB.
https://github.com/couchbaselabs/TouchDB-iOS
I've done a fair amount of it, but you can always set up CouchDB on one machine, set up TouchDB on a mobile device and then watch the HTTP traffic go back and forth to get an idea of how they do it.
Or read this: http://guide.couchdb.org/draft/replication.html
Maybe something from the link above will help you get an idea of how to do your own diffs for your REST service. (Since they are both over HTTP thought it could be useful.)
You may want to look into the Dropbox Datastore API:
https://www.dropbox.com/developers/datastore
It sounds like it might be a very good fit for your purposes. They have iOS and javascript clients.
Lately, I've been interested in Meteor.
The platform sets up Mongo on the server and minimongo in the browser. The client subscribes to some data and when that data changes, the platform automatically sends down the new data to the client.
It's a clever solution to the syncing problem, and it solves several other problems as well. It will be interesting to see if more platforms do this in the future.

How to serve CPU intensive webservice requests in the cloud?

Background: I'm running a webservice in which each request involves a fair amount of computations (up to 10 seconds on a quadcore machine).
Each request can be broken down to about 150 independent (and equally small) subtasks.
What I'm after: l'm looking for a hosting service that allows me to serve these kinds of requests efficiently in a scalable manner.
What I've considered: I've looked into Google App Engine and Rackspace.
It seems to me as if GAE is intended for simple requests, requiering litte resources to process. Problem with something like Rackspace is that I can't tell in advance how many vCPUs I may need (and even if I knew how big future spikes would be, I don't want to sit with, say, 40 servers idling the rest of the time)
Questions:
Would it be possible to use GAE in the following way:
For each request, split it up into 150 subtasks
Process all subtasks independently by doing 150 concurrent HTTP requests to the same webapp (but through a differrnt method)
Collect the results from the "subresults" and return a response to the original request.
Is there any possibility that Map Reduce for GAE could be of any help?
Is there any other service better suited for this task?
Yes, this is possible. The usual way would be to use Task Queue, possibly via DeferredTask helper class.
1.3 Normal web requests (to frontend instances) are limited to 30s, so doing this in synchronous way is not guaranteed to succeed. Also note that instances are artificially limited to do 10 parallel requests (if multithreading is enabled).
Yes, this is a job for map reduce. But note that map reduce is async - you give it tasks to do and it will be done sometime in the future.
Given the processing you need you might want to look at GAE backends (they are long running with multithrading and come in different sizes). If you need even more processing power, then you might want to look at Compute Engine.
Unless all of these 150 subtasks are read-only activities, trying to run them all in a single thread is just not safe. Web requests are unreliable - people can cancel, hit refresh if it takes too long, close windows in the middle, or just time out due to network issues. The background HTTP requests, likewise, can have a whole mess of problems. The standard solution is to have your front-end code simply build a list of things that need to be done, so it can get back to the user quickly, and have a back-end 'worker' process handle the (potentially unreliable) subtasks. Depending on what your application is doing, you might bounce the user to a "working" screen (like searching for airfare) where they can safely wait for the results to come up, or it might just be stuffed away as a "pending" job (like ordering something from Amazon).
There's countless different ways to handle this basic workflow. If you stick with Google App Engine, they have a "task queue" as part of the platform - providing a simple mechanisms for creating & dispatching background tasks. If you go with Rackspace, their cloud offering is less of a unified platform so you'll have to either roll your own queue or get one to plug into your setup.

why is the lift web framework scalable?

I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks
Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.
at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.
Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.
I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.