i have a sharded database (Postgres 9.4). I want data from some tables (4 or 5 tables with relatively rare updates) to be replicated through all the shards, but other tables shouln't be replicated at all. What is the best way to do this?
Can it be done using only the Postgres functionality?
What plugins may be used if not?
Thank you in advance.
With streaming replication that is not possible.
Related
We have a system with large data (about 10 million rows in on a table). We developed it in Django framework and also we want to use Amazon Lambda for serving it. Now I have some question about it:
1- If we want to use Amazon RDS (MySql, PostgresSQL), which one is better? And relational database is a good solution for doing this?
2- I read somewhere, If we want to use a relational database in Amazon Lambda, Django for each instance, opens a new connection to the DB and it is awful. Is this correct?
3- If we want to use GraphQL and Graph database, Is that a good solution? Or we can combine Django Rest-API and GraphQL together?
4- If we don't use Django and use Amazon AppSync, Is better or not? What are our limitations for use this.
Please help me.
Thanks
GraphQL is very useful for graph data, not timeseries. Your choice will depend on the growth factor, not the actual rows. I currently run an RDS instance with 5 billion rows just fine, but the problem is how it will increase over time. I suggest looking into archival strategies using things like S3 or IoT-analytics (this one is really cool).
I wouldn't worry about concurrent connections until you have a proper reason too (+50's per second). Your DB will be the largest server you have anyway.
Currently we are migrating Dynamodb table to Spanner. Since DynamoDb is a nosql database with indexing, it become a difficult task to migrate NOSQL to relational database. The only reason we are migrating it to Spanner is because of secondary indexing. But after migrating few tables, we are witnessing the latency issues in Spanner. Initially we were planned to migrate it to Cloud BigTable, but unfortunately it doesn't support secondary index. Now because of latency issue and high read/write traffic, Spanner performance is going down. Do we have any other data stores in GCP, which would be more suitable with this kind of use case, where we can have nosql as well as secondary index? We have around 200 TB's of data in DynamoDb.
The Google Spanner documentation Quotas & Limits, for improved performance, you should have a node for every 2 TB of data that you have on it. Considering that, I would recommend you to take a look at your nodes and raise the number of them that you have, active right now, yo improve the performance of your database.
On this documentation here, you have the best practices to configure a Spanner as it's best possible performance.
In case this doesn't help, could you please take a look at the documentation Troubleshooting performance regressions? This way, you can take a further look at what might be affecting the performance of your Spanner.
Let me know if the information helped you!
Go to firebase in datastore mode. It has secondary indexes and basically is serverless and practically unlimited in throughput. And is a nosql db as well
I am reading in AWS console about Redis and MemcacheD:
Redis
In-memory data structure store used as database, cache and message broker. ElastiCache for Redis offers Multi-AZ with Auto-Failover and enhanced robustness.
Memcached
High-performance, distributed memory object caching system, intended for use in speeding up dynamic web applications.
Did anyone used/compared both? What is the main difference and use cases between the two?
Thanks.
Pasting my answer from another stackoverflow question
Select Memcached if you have these requirements:
You want the simplest model possible.
You need to run large nodes with multiple cores or threads.
You need the ability to scale out/in,
Adding and removing nodes as demand on your system increases and decreases.
You want to partition your data across multiple shards.
You need to cache objects, such as a database.
Select Redis if you have these requirements:
You need complex data types, such as strings, hashes, lists, and sets.
You need to sort or rank in-memory data-sets.
You want persistence of your key store.
You want to replicate your data from the primary to one or more read replicas for read intensive applications.
You need automatic failover if your primary node fails.
You want publish and subscribe (pub/sub) capabilities—to inform clients about events on the server.
You want backup and restore capabilities.
Here is interesting article by aws https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf
I have a working Django web application that currently uses Postgresql as the database. Moving forward I would like to perform some analytics on the data and also generate reports etc. I would like to make use of Amazon Redshift as the data warehouse for the above goals.
In order to not affect the performance of the existing django web application, I was thinking of writing a NEW Django application that essentially would leverage a READ-ONLY replica of the Postgresql database and continuously write data from read-only replicas to the Amazon Redshift. My thinking is that perhaps the NEW Django application can be used to handle some/all of the Extract, Transform and Load functions
My questions are as follows:
1. Does the Django ORM work well with Amazon Redshift? If yes, how does one handle the model schema translations? Any pointers in this regard would be greatly appreciated.
2. Is there any better alternative to achieve the goals listed above?
Thanks in advance.
I have choice of using Hbase or cassandra. I will be writing map-reduce tasks to process data.
So which will be better choice Hbase or cassandra. And which will be better to use with hive and pig?
I have used both. I am not sure what #Tariq means by modified without cluster restart as I don't restart the cluster when I modify cassandra schemas. I have not used Pig and Hive but from what I understand those just sit on map/reduce and I have used the map/reduce cassandra adapter which works great. We also know people who have used PlayOrm with map/reduce a bit as well and PlayOrm as of yet does not have the hbase provider written. They have cassandra and mongodb right now so you can write your one client and it works on either database. Of course for specific features of each nosql store, you can get the driver and talk directly to the nosql store instead of going through playOrm but many features are very similar between nosql stores.
I would suggest HBase as it has got native MR support and runs on top of your existing Hadoop cluster seamlessly. Also, simpler schema that can be modified without a cluster restart is a big plus. It also provides easy integration with Pig and Hive as well.