Implement a counter in etcd API? - concurrency

Any suggestions for implementing a shared distributed counter in etcd using the API? I imagine I can create a lock that that manages a KV. But is there a different "best practice" way to do this in etcd?

Related

How can achieve loose coupling 'Service Oriented Architecture' (SOA)? Explain using a suitable solution

An IT architecture composed of software that has been exposed as “Services” – i.e. invoked on-demand using a standard communication protocol. So, loose coupling on how to use SOA, give a good example.
There are three major types or methods or approaches that have been emerging for club information, disparate and systems in a business. As different service providers and businesses race towards providing solutions to customers and consumers, these approaches help to meet the requirements for coarse-grained, loosely clubbed and asynchronous services.
1. The Enterprise Service Bus
The first approach that helps to build and implement an optimal SOA is the enterprise service bus or ESB. This approach helps to coordinate and arrange the different elements that are in the form of distributed services on a network. This approach considers the systems to be discrete and distributed services that connect to one another through message oriented infrastructure that is asynchronous. This kind of a message-oriented infrastructure makes it possible to have loosely coupled connections between independent services or modules.
2. Business Process Management
Many companies, for many years now, have tried to solve business process problems by the implementation of Business Process Management approach. This approach takes into consideration the IT assets and systems as activities or tasks that participate in well synchronized and well-orchestrated business procedures. BPM tools are mainly used at the time of modeling and designing procedures rather than using them to construct processes that can reach integration objectives. This is the main challenge of BPM. By BPM solutions on their own are enough to meet SOA requirements because they do not consist of the runtime environment that is needed for loosely coupled modules.
3. Service Oriented Integration
The third and the last approach to proper implementation of SOA is the service-oriented integration approach. This particular approach makes use of the architectural guiding rules or principles to build an environment or ecosystem of services that businesses can combine dynamically and create superior level processes that can meet ever changing and evolving requirements. This approach moves past tightly coupled and brittle modules by creating a distinction between the consumer and producer of a service. It thus imposes the aspect of loose coupling that is needed to implement SOA properly to meet business requirements. Even this approach by itself isn’t sufficient to guarantee long time running interactions between services.

Difference between Zookeeper and a managed replicated database service

I just came across Zookeeper and am wondering as to what's the difference between Zookeeper and an available, consistent, durable, distributed, replicated database service like AWS DynamoDB or even AWS S3(storage service) for that matter. The key features like configuration management, distributed synchronization etc can very well be achieved with a database offering like AWS DynamoDB. I understand that there would be architectural differences between Zookeeper and products like DynamoDB. But, from a feature perspective, are there any major differences between the two ?
Is there any reason to use Zookeeper over the other.
First let me tell you some basics about zookeeper which you may already know:
Zookeeper is not a database
Zookeeper is a coordination service
Zookeeper is highly available and capable of managing more than 4000 nodes in a cluster.
Zookeeper stores all its information in znodes, and every Znode can be of 1 mb max.
Zookeeper provides 3 types of znodes: ephemeral, sequential and persistence
Now, to answer your query:
Zookeeper is used for providing exclusive locks to the services where there is a master-slave architecture and you want only one service to be active and perform all the reads/writes.
Zookeeper can be used for sessions also. Like an ephemeral node will be generated per user for session and when the user logs out, the node will automatically be deleted from the zookeeper memory.
Zookeeper is reliable and fault-tolerant and performs in-memory operations which makes it even faster.
So, there are the main reason why zookeeper is considered above any other services providing coordination.
Zookeeper in a nutshell if a distributed kernel, it provides low primitives using which you can build complex DISTRIBUTED SYSTEMS further.
1) Zookeeper provides ordered messages, which is very required for distributed locks(distributes systems in general). Dynamo db does not provide ordered message per client guarantee.
2) Sequential znode provide atomic way to add elements in a ordered way with a common prefix string. Combined with Ephemeral nodes and ordered notification they let you create notification.
lets say you want to lock a customerABCD to perform a work, every machine can write
Create('/customerABCD/lock-', Sequential)
if there are 2 nodes performing above Create then znodes formed will be
/customerABCD/lock-1 & /customerABCD/lock-2.
To decide who is leader you can simple query
Get('/customerABCD') key and then decide leader with least key value. Now lets say Node which created lock-1 dies, then lock-2 will get notification message from zookeeper and then it can claim ownership of customerABCD.
More examples of such distributed tasks are in https://learning.oreilly.com/library/view/zookeeper/9781449361297/ch02.html
In Dynamo machine which created /customerABCD/lock-2 znode will have to poll to know if lock exists or not. This is slow way to acquire lock as it requires timeout based polling, this is inefficient as compute is required to perform poll as well, and adds polling load to system as well.
3) when znodes are added/removed then zxid version gets incremented. This forms the basis of versioning which can be used by distributed systems to achieve lock with fencing as explained in "Making the lock safe with fencing" in link https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
Again Dynamo does not seems to have similar auto-increment parent sequence number facility.

Choosing akka persistence db

The default database for Akka persistence is LevelDB, but I saw that there are plugins for Redis, MongoDB, and others.
What are the factors that I should take into account when choosing between them? In my use case, I want to use persistence for recovery after processes fail. The state of the actor is a big data structure with high throughput of write/read operations.
Another option is that the actor will use directly Redis persistence without any plugin for Akka persistence. Is this option less preferred?

Is there anyway to share stateful variables in dataflow pipeline?

I'm making dataflow pipeline with python.
I want to share global variables across pipeline transform and across worker nodes like global variables (across multiple workers).
Is there any way to support this?
thanx in advance
Stateful processing may be of use for sharing state between workers of a specific node (would not be able to share between transforms though):
https://beam.apache.org/blog/2017/02/13/stateful-processing.html

Akka clustering - force actors to stay on specific machines

I've got an akka application that I will be deploying on many machines. I want each of these applications to communicate with each others by using the distributed publish/subscribe event bus features.
However, if I set the system up for clustering, then I am worried that actors for one application may be created on a different node to the one they started on.
It's really important that an actor is only created on the machine that the application it belongs to was started on.
Basically, I don't want the elasticity or the clustering of actors, I just want the distributed pub/sub. I can see options like singleton or roles, mentioned here http://letitcrash.com/tagged/spotlight22, but I wondered what the recommended way to do this is.
There is currently no feature in Akka which would move your actors around: either you programmatically deploy to a specific machine or you put the deployment into the configuration file. Otherwise it will be created locally as you want.
(Akka may one day get automatic actor tree partitioning, but that is not even specified yet.)
I think this is not the best way to use elastic clustering. But we also consider on the same issue, and found that it could to be usefull to spread actors over the nodes by hash of entity id (like database shards). For example, on each node we create one NodeRouterActor that proxies messages to multiple WorkerActors. When we send message to NodeRouterActor it selects the end point node by lookuping it in hash-table by key id % nodeCount then the end point NodeRouterActor proxies message to specific WorkerActor which controlls the entity.