Couchdb with Clouseau plugin is taking more storage than expected - amazon-web-services

I've been using an AWS instance with CouchDb as a backup to IBM's Cloudant database of my application (using replication).
Everything seems to work fine but I've been noticing the permanent increase of Volume size in the AWS instance (it gets full all the time with the annoying problem of increasing a volume when there's no space in the partition).
Actual use of storage
The data in the screenshot is using almost 250 GB. I would like to know the possible reason for this issue, my guess is that the Clouseau plugin is using more space to enable the search index queries.
As I'm not an expert with this database, Anyone could explain to me why this is happening and how could I mitigate the issue?
My best regards!

If you are only backing up a Cloudant database to a CouchDB instance via replication, you should not need Clouseau enabled.
Clouseau is only required for search indices and if you are not doing queries on your backup database you can disable Clouseau in there. The indices are not backed up in the replication.

Related

General guidance around Bigtable performance

I'm using a single node Bigtable cluster for my sample application running on GKE. Autoscaling feature has been incorporated within the client code.
Sometimes I experience slowness (>80ms) for the GET calls. In order to investigate it further, I need some clarity around the following Bigtable behaviour.
I have cached the Bigtable table object to ensure faster GET calls. Is the table object persistent on GKE? I have learned that objects are not persistent on Cloud Function. Do we expect any similar behaviour on GKE?
I'm using service account authentication but how frequently auth tokens get refreshed? I have seen frequent refresh logs for gRPC Java client. I think Bigtable won't be able to serve the requests over this token refreshing period (4-5 seconds).
What if client machine/instance doesn't scale enough? Will it cause slowness for GET calls?
Bigtable client libraries use connection pooling. How frequently connections/channels close itself? I have learned that connections are closed after minutes of inactivity (>15 minutes or so).
I'm planning to read only needed columns instead of entire row. This can be achieved by specifying the rowkey as well as column qualifier filter. Can I expect some performance improvement by not reading the entire row?
According to GCP official docs you can get here the cause of slower performance of Bigtable. I would like to suggest you to go through the docs that might be helpful. Also you can see Troubleshooting performance issues.

"Max storage size not supported" When Upgrading AWS RDS

I am using db.m5.4xlarge but our users increase lot so the server is going too slow, we want to upgrade RDS to db.m5.8xlarge, But when I try to upgrade RDS, it gave me an error (Max storage size not supported).
I think the reason is that, unlike db.m5.4xlarge, db.m5.8xlarge does not support MySQL. From docs:
Judged on discussion with you I think it might actually be more beneficial for you to take a look at creating read replicas rather than an ever growing instance.
The problem with increasing the instance as you are are now is that everytime it will simply reach another bottleneck and it becomes a single point of failure.
Instead the following strategy is more appropriate and may end up saving you cost:
Create read replicas in RDS to handle all read only SQL queries, by doing this you are going to see performance gains over your current handling. You might even be able to scale down the write cluster.
As your application is read heavy look at using caching for your applications to avoid heavy read usage. AWS provides ElastiCache a a managed service using either Redis or MemcacheD as the caching service. This again could save you money as you won't need as many live reads.
If you choose to include caching too take a look at these caching strategies to work out how you would want to use it.

Can data remain incorrupt and integrity if take snapshot in a busy database in Google Compute Engine?

We install our own MySQL in GCE and we are thinking to use GCE snapshot as a backup solution. As our MySQL database is quite busy, we would like to know if taking snapshot on it while still in production, can the data be incorrupt and remain integrity in the snapshot? Thanks.
As described in Best Practices for Persistent Disk Snapshots documentation, if your database is in use during snapshot you may have some data loss.
If you don't have too many write but lots of read, that could do the trick as the chance of loosing new datas will be smaller, but that's still not a 100% sure thing.

Is this normal for gcp Cloud SQL disk usage

I created a cloud sql db for learning purposes a while ago and have basically never used it for anything. Yet the storage / disk space keeps climbing:
updated image
Updating image to show timescale... this climb seems to be within just a few hours!
Is this normal? If not, how do I troubleshoot / prevent this steady climb? The only operations against the db seem to be backup operations. I'm not doing any ops (afaik).

Can long running query improves performance using AWS?

As we are a data warehouse team, we deals with millions of records in and out on daily basis. We have jobs running ever day, and loads on to SQL Server Flex clones from oracle DB through ETL loads. As we are dealing with huge amount of data and complex queries, query runs pretty longer and it goes to hours. So we are looking towards using AWS. We wanted to setup our own licensed Microsoft SQL server on EC2. But I was wondering, how this will improve performance of long running query. What would be the main reason that same query takes longer on our own servers and executes faster on AWS. Or did I misunderstood the concept?(just letting you know I am at a learning phase)
PS: We are still in a R&D phase. Any thoughts or opinion would be greatly appreciated regarding AWS for long running queries.
You need to provide more details on your question.
What is your query ?
How big is the tables ?
What is the bottle neck ? CPU ? IO ? RAM ?
AWS is just infrastructure.
It does makes your life easier because you can scale up or down your machine in a click of buttons.
Well, I guess you can crank up your machine to however big you want, but even so, nothing will solve a bad query and bad architecture.
Keep in mind, EC2 comes with 2 type of disk. EBS and Ephemeral.
EBS is SAN. Ephemeral is attached to the EC2 instance it self.
By far, Ephemeral will be much faster of course, but the downside is that when you shutdown your EC2 and start it up again, all of the data in that drive is wiped clean.
As for licensing (windows and SQL Server), it is baked into the EC2 instance pre baked AMI (Amazon Machine Image).
I've never used my own license in EC2.
With same DB, Same hardware configuration, query will perform similarly on AWS or on prim. You need to check whether you have configured DB / indexes etc optimally. Also, think of replicating data to some other database which is optimized for querying huge amount of data.