Connect to and work with a RethinkDB cluster - database-replication

I can't seem to find a lot of documentation on how the clusters in RethinkDB actually work.
In Cassandra I connect to a cluster by defining one or more hosts, so in case one of them is down, or even has been removed, I still can connect to the whole cluster, before the code/configuration will be updated, reflecting the changes of my hosts IP addresses.
As far as I've understood it, RethinkDB doesn't have such a logic and I'd need to implement it myself, but I'd still be at all times connected to the whole cluster, is that correct?
When creating a database, it is "kind of" created for the whole cluster, there is no way and no need to specify the exact servers which would be taking care of it. When creating a table and I don't specify a primary replica tag, which server will be the primary replica? If I specify a tag which is assigned to multiple servers - same question applies. How is the final server which will be the main replica selected?

In Cassandra I connect to a cluster by defining one or more hosts, so in case one of them is down, or even has been removed, I still can connect to the whole cluster, before the code/configuration will be updated, reflecting the changes of my hosts IP addresses.
In RethinkDB, you connect to the cluster by connecting to a node in the cluster. That node will take care of communicating with all the other nodes in the cluster. If that node disconnects from the cluster, then you might not be able to do writes or read, depending on your cluster sharding and replication. If that node fails, you won't be able to do anything. At that point, you can try connecting to another node.
As far as I've understood it, RethinkDB doesn't have such a logic and I'd need to implement it myself
Yes, RethinkDB won't automatically reconnect you to another node in the cluster if your node fails. That being said, this might be as simple as having multiple connections and switching between them (unless I'm missing something!).
When creating a database, it is "kind of" created for the whole cluster, there is no way and no need to specify the exact servers which would be taking care of it.
Yes, when you create a database it's created for the whole cluster. A database doesn't really 'live' in a specific node. It's only tables that live in a specific node.
When creating a table and I don't specify a primary replica tag, which server will be the primary replica?
RethinkDB will automatically take care of that. It will pick the server where the primary replica will be, based on the following:
Sever distribution load (which servers have more tables and data).
Wether a specific server was already a primary/secondary for that table.
If you want to manually control in which server the primary or secondary ends up, you can set it manually through the table_config table in the rethinkdb database. (You take a peak at that database. It give you a better view into how RethinkDB works!)
If I specify a tag which is assigned to multiple servers - same question applies.
Same as above.
How is the final server which will be the main replica selected?
Same as above.
In terms of documentation, I would suggest the following:
Sharding and replication: http://rethinkdb.com/docs/sharding-and-replication/ (Although your questions suggest you probably already read this :))

Related

What is the difference between nodes, cluster, and database in redshift

Reading thru the documentation of aws, im quite confused with these three concepts
Cluster: composed of one or more compute nodes; composed of one or mode database
Compute node: run the query execution plans and transmit data among themselves to serve these queries
Database: User data is stored on the compute nodes
With this it is easy to assume that a compute node and a database is the same, isn't it? But when creating a redshift cluster, a portion of it is named as database configuration but seemingly referring to cluster. Below is an image of it, if my understanding is correct from the documentation, database configuration should be referring to compute nodes and not the cluster.
With these, what exactly is a cluster, database, and a compute node?
With this it is easy to assume that a compute node and a database is the same, isn't it?
No, that's not the case. You can have single-node Redshift cluster with multiple databases, or a single (large) database hosted on multiple compute nodes.
Basically, node refers to the hardware layer of Redshift, while database refers to software layer only.
Your screenshot shows only a default database called dev. You can create many more if you want. All hosted on the same cluster.

Does AWS take down each availability zones(A-Z) or whole regions for maintenance

AWS has a maintenance window for each region.
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/maintenance-window.html but could not find any documentation about how it works with multiple A-Z in the same region.
I have a Redis cache configured and have a replica on different(A-Z) in the same region. The whole purpose of configuring replica on different(A-Z) if one (A-Z) is not available serve it from next(A-Z)
When they doing maintenance are they take down the whole region or individual availability zone
You should read the FAQ on ElastiCache maintenance https://aws.amazon.com/elasticache/elasticache-maintenance/
This says that if you have a multi AZ deployment, it will take down the instances one at a time triggering a fail-over to the read replica, and then create new instances before taking down the rest so you should not experience any interruptions in your service.
Thanks #morras for the above link and explains how elasticache works maintenance window period. Below 3 question I have taken out from the above link and explain about it.
1. How long does a node replacement take?
A replacement typically completes within a few minutes. The replacement may take longer in certain instance configurations and traffic patterns. For example, Redis primary nodes may not have enough free memory, and may be experiencing high write traffic. When an empty replica syncs from this primary, the primary node may run out of memory trying to address the incoming writes as well as sync the replica. In that case, the master disconnects the replica and restarts the sync process. It may take multiple attempts for replica to sync successfully. It is also possible that replica may never sync if the incoming write traffic continues to remains high.
Memcached nodes do not need to sync during replacement and are always replaced fast irrespective of node sizes.
2. How does a node replacement impact my application?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. For single node Redis clusters, ElastiCache dynamically spins up a replica, replicates the data, and then fails over to it. For replication groups consisting of multiple nodes, ElastiCache replaces the existing replicas and syncs data from the primary to the new replicas. If Multi-AZ or Cluster Mode is enabled, replacing the primary triggers a failover to a read replica. If Multi-AZ is disabled, ElastiCache replaces the primary and then syncs the data from a read replica. The primary will be unavailable during this time.
For Memcached nodes, the replacement process brings up an empty new node and terminates the current node. The new node will be unavailable for a short period during the switch. Once switched, your application may see performance degradation while the empty new node is populated with cache data.
3. What best practices should I follow for a smooth replacement experience and minimize data loss?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. We try to replace just enough nodes from the same cluster at a time to keep the cluster stable. You can provision primary and read replicas in different availability zones. In this case, when a node is replaced, the data will be synced from a peer node in a different availability zone. For single node Redis clusters, we recommend that sufficient memory is available to Redis, as described here. For Redis replication groups with multiple nodes, we also recommend scheduling the replacement during a period with low incoming write traffic.
For Memcached nodes, schedule your maintenance window during a period with low incoming write traffic, test your application for failover and use the ElastiCache provided "smarter" client. You cannot avoid data loss as Memcached has data purely in memory.

Automated setup for multi-server RethinkDB cluster via an ECS service

I'm attempting to set up a RethinkDB cluster with 3 servers total spread evenly across 3 private subnets, each in different AZ's in a single region.
Ideally, I'd like to deploy the DB software via ECS and provision the EC2 instances with auto scaling, but I'm having trouble trying to figure out how to instruct the RethinkDB instances to join a RethinkDB cluster.
To create/join a cluster in RethinkDB, when you start up a new instance of RethinkDB, you specify host:port combination of one of the other machines in the cluster. This is where I'm running into problems. The Auto Scaling service is creating new primary ENI's for my EC2 instances and using a random IP in my subnet's range, so I can't know the IP of the EC2 instance ahead of time. On top of that, I'm using awsvpc task networking, so ECS is creating new secondary ENI's dedicated to each docker container and attaching them to the instances when it deploys them and those are also getting new IP's, which I don't know ahead of time.
So far I've worked out one possible solution, which is to not use an autoscaling group, but instead to manually deploy 3 EC2 instances across the private subnets, which would let me assign my own, predetermined, private IP. As I understand it, this still doesn't help me if I'm using awsvpc task networking though because each container running on my instances will get its own dedicated secondary ENI and I wont know the IP of that secondary ENI ahead of time. I think I can switch my task networking to bridge mode, to get around this. That way I can use the predetermined IP of the EC2 instances (the primary ENI) in the RethinkDB join command.
So In conclusion, the only way to achieve this, that I can figure out, is to not use Auto Scaling, or awsvpc task networking, both of which would otherwise be very desirable features. Can anyone think of a better way to do this?
As mentioned in the comments, this is more of an issue around the fact you need to start a single RethinkDB instance one time to bootstrap the cluster and then handle discovery of the existing cluster members when joining new members to the cluster.
I would have thought RethinkDB would have published a good pattern in their docs for this because it's going to be pretty common when setting up clusters but I couldn't see anything useful in their docs. If someone does know of an official recommendation then you should definitely use this rather than what I'm about to propose especially as I have no experience with running RethinkDB.
This is more just spit-balling here and will be completely untested (at least for now) but the principle is going to be you need to start a single, one off instance of RethinkDB to bootstrap the cluster, then have more cluster members join and then ditch the special case bootstrap member that didn't attempt to join a cluster and leave the remaining cluster members to work.
The bootstrap instance is easy enough to consider. You just need a RethinkDB container image and an ECS task that just runs it in stand-alone mode with the ECS service only running one instance of the task. To enable the second set of cluster members to easily discover cluster members including this bootstrapping instance it's probably easiest to use a service discovery mechanism such as the one offered by ECS which uses Route53 records under the covers. The ECS service should register the service in the RethinkDB namespace.
Then you should create another ECS service that's basically the same as the first but in an entrypoint script should list the services in the RethinkDB namespace and then resolve them, discarding the container's own IP address and then uses the discovered host to join to with --join when starting RethinkDB in the container.
I'd then set the non bootstrap ECS service to just 1 task at first to allow it to discover the bootstrap version and then you should be able to keep adding tasks to the service one at a time until you're happy with the size of the non bootstrapped cluster leaving you with n + 1 instances in the cluster including the original bootstrap instance.
After that I'd remove the bootstrap ECS service entirely.
If an ECS task dies in the non bootstrap ECS service dies for whatever reason it should be able to auto rejoin without any issue as it will just find a running RethinkDB task and start that.
You could probably expand the checks for which cluster member to join to by checking that the RethinkDB port is open and running before using that as a member to join so it will handle multiple tasks being started at the same time (with my original suggestion it could potentially find another task that is looking to join the cluster and try to join to that first, with them all potentially deadlocking if they all failed to randomly pick the existing cluster members by chance).
As mentioned, this answer comes with a big caveat that I haven't got any experience running RethinkDB and I've only played with the service discovery mechanism that was recently released for ECS so might be missing something here but the general principles should hold fine.

Solr data migration between AWS EC2 instances

I am planning to set up a Solr server on a EC2 instance. As traffic grows I might have move the solr server from a smaller instance to a bigger one. But this change will need to happen in realtime when the old solr instance serves traffic. So I am concerned that while doing this switch, some valuable data that might been indexed could get lost. Also the data from old server will need to be moved to the new server. There would be a significant time required to do this.
Also when the traffic cannot be handled by the largest server, SolrCloud will need to be deployed on multiple servers and the same data migration issue could occur.
Is there an efficient and a more robust way to do this?
you could probably:
DO start using SolrCloud from the get go, but just with a single node/one shard. At this point there is nothing 'Cloud-dy' here but not harm done either.
When traffic grows, you can create the new bigger EC2 instance, and add it to cluster. Now you have a 'working' SolrCloud cluster with a replica.
As needed, keep adding nodes, and creating more shards/replicas.

Scaling Up an Elasticache Instance?

I'm currently running a site which uses Redis through Elasticache. We want to move to a larger instance with more RAM since we're getting to around 70% full on our current instance type.
Is there a way to scale up an Elasticache instance in the same way a RDS instance can be scaled?
Alternative, I wanted to create a replica group and add a bigger instance to it. Then, once it's replicated and running, promote the new instance to be the master. This doesn't seem possible through the AWS console as the replicas are created with the same instance type as the primary node.
Am I missing something or is it simply a use case which can't be achieved. I understand that I can start a bigger instance and manually deal with replication then move the web servers over to use the new server but this would require some downtime due to DNS migration, etc.
Thanks!,
Alan
Elasticache feels more like a cache solution in the memcached sense of the word, meaning that to scale up, you would indeed fire up a new cluster and switch your application over to it. Performance will degrade for a moment because the cache would have to be rebuilt, but nothing more.
For many people (I suspect you included), however, Redis is more of a NoSQL database solution in which data loss is unacceptable. Amazon offers the read replicas as a "solution" to that problem, but it's still a bit iffy. Of course, it offers replication to reduce the risk of data loss, but it's still nowhere near as production-safe (or mature) as RDS for a Redis database (as opposed to a cache, for which it's quite perfect), which offers backup and restore procedures, as well as well-structured change management to support scaling up. To my knowledge, ElastiCache does not support changing the instance type for a running cluster. This suggests that it's merely an in-memory solution that would lose all its data on reboot.
I'd go as far as saying that if data loss concerns you, you should look at a self-rolled Redis solution instead of simply using ElastiCache. Not only is it marginally cheaper to run, it would enable you to change the instance type like you would on any other EC2 instance (after stopping it, of course). It would also enable you to use RDB or AOF persistence.
You can now scale up to a larger node type while ElastiCache preserves:
https://aws.amazon.com/blogs/aws/elasticache-for-redis-update-upgrade-engines-and-scale-up/
Yes, you can instantly scale up a running Elasticache instance type to a larger size. I've tested it and experienced very little actual downtime (I think a few seconds at first, but very quickly it's back online, even while the Console will show the process taking roughly a few minutes to actually finish.) I went from a t2.micro to a m3.medium with no problem.
You can scale up or down
Go to Elasticache service
Select the cluster
From Actions menu in top, choose Modify
Modify Node Type as shown below
If you have a cluster, you can add more shards, decrease number of shards, rebalance slot distributions, or add more read replicas. just click on the cluster itself, you should be see something like this
Be aware when you delete shards, it will automatically redistribute data to other existing shards so it will affect on traffic and overloading other shards, when you try to delete a shard you would get a warning like this
Still need more help, please feel free to leave a comment and I would be more than happy to help.