I had created a simple table in dynamo called userId, I could view it in the AWS console and query it through some java on my local machine. This morning, however, I could no longer see the table in the dynamo dashboard but I could still query it through the java. The dashboard showed no tables at all (I only had one, the missing 'userId'). I then just created a new table using the dashboard, called it userId and populated it. However, now when I run my java to query it, the code is returning the items from the missing 'userId' table, not this new one! Any ideas what is going on?
Ok, that's strange. I thought dynamo tables were not specified by region but I noticed once I created this new version of 'userId' it was viewable under the eu-west region but then I could see the different (previously missing!) 'userId' table in the us-east region. They both had the same table name but contained different items. I didn't think this was possible?
Most of the services of Amazon Web Services are in a single region. The only exceptions are Route 53 (DNS), IAM, and CloudFront (CDN). The reason is that you want to control the location of your data, mainly for regulatory reasons. Many times your data can't leave the US or Europe or any other region.
It is possible to create high availability for your services within a single region with availability zones. This is how the highly available services as DynamoDB or S3 are giving such functionality, by replicating the data between availability zones, but within a single region.
Related
I'm trying to understand DynamoDB replication & failover strategies but do not find any articles on the web which clarifies them. I understand cross-region replication can be achieved by DynamoDB with Global Tables but I also understand this is a multi-active table setup, meaning there are multiple active tables and multiple replica table. Is there a setup with single-active table and multiple replicas? I briefly read about this in this article but do not find any mentions anywhere else including AWS documentation.
I'm also trying to understand failover strategies for both cases - Is there a DynamoDB Java Client which can failover across AZs in case of issues in one AZ for both reads & writes?
DynamoDB Global Tables are always active-active but you can treat it as active-passive if you prefer. Many people do. That's useful if you want to use features like condition expressions, transactions, or do any non-idempotent wheres where you could have the same item being written around the same time in both regions with the second write happening before the first replicates, because this would cause the first write to be effectively lost.
To do this you just route your write traffic to one region, and to failover you decide when it's time to write to another. The failover region is always happy to be an active region if you'll let it.
As for AZs, DynamoDB is a regional service meaning it crosses at least 3 AZs always and would keep operating fine even if a full AZ were to be down. You don't have to worry about that.
Is there a setup with single-active table and multiple replicas
Unfortunately there is no such single active and multiple replica setup for cross region in dynamodb using global tables, so failover strategy will be for multiple active tables and multiple replica tables! - Source - docs
Fro failover strategies
According to docs
If a single AWS Region becomes isolated or degraded, your application can redirect to a different Region and perform reads and writes against a different replica table.
This means this is seamingless smooth process which happends by default ofcourse you can add custom logic when to redirect
I have a react app using Amplify with auth enabled. The app has many users, all of whome are members of one "client", no more.
I would like to be able to limit access to the data in a Glue table to users that are members of the client, using IAM, so that I have a security layer as close to the data layer as possible.
I have a 'clientid' partition in the table. The table is backed by an s3 bucket, with each client's data stored in their own 'clientid=xxxxxx' folder. The table was created by a Glue job with the following option in the "write_dynamic_frame" method at the end, which created the folders.
{"partitionKeys": ["clientid"]},
My first idea was to use the clientid in the front-end to bake the user's client ID into the query to select just their partition but, clearly, that is open to abuse.
Then I tried to use a Glue crawler to scan the existing table's s3 bucket in the hope it would create one table per folder, if I unchecked the "Create a single schema for each S3 path" option. However, the crawler 'sees' the folders as partitions (presumably, in at least part, due to the hive partitioning structure) and I just get a single table again.
There are tens of thousands of clients and TB's of data, so moving/renaming data around and manually creating tables is not feasible.
Please help!
I assume you have a mechanism in place already to assign an IAM role (individual or per client) to each user on the front end, otherwise that's a big topic that should probably be its own question.
The most basic way to solve your problem is to make sure that the IAM roles only have s3:GetObject permission to the prefix of the partition(s) that the user is allowed to access. This would mean that users can only access their own data and will receive an error if they try accessing other users' data. They could potentially fish for what client IDs are valid, though, by trying different combinations and observing the difference between the query not hitting any partition (which would be allowed since no files would be accessed), and the query hitting a partition (which would not be allowed).
I think it would be better to create tables, or even databases per client, that would allow you to put permissions at the Glue Data Catalog level too, not allowing queries at all for other databases/tables than the user's own. Glue Crawlers won't help you with that unfortunately, they're too limited in what they can do, and will try to be helpful in unhelpful ways. You can create these tables easily with the Glue Data Catalog API and you won't have to move any data, just point the tables' locations at the locations of the current partitions.
I am new to AWS. Sorry if my question is basic, got stuck with this term.
AWS Global Infrastructure says "18 geographic Regions" -> Geographic term is used along with Regions, that makes sense.
DynamoDB FAQs 3rd questions says, "Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability."
Here(three geographically) is it referring to Region or Availability Zones ? Bit confused. If it is Region, does it mean my data is going out of my country(if my country has only 1 Region).
Please suggest.
Geographically isolated in this documentation refers to Availability Zones and not Regions. As per AWS documentation when you create a table in one region, it's replicated in others zones to ensure the high availability. If you do some activity in the table it's updated in the replicas. The AZ's are interconnected with low latency networks.
The data is stored on SSD disks and automatically replicated across
multiple Availability Zones in an AWS region, which brings the high
availability and your data is durable.
If you create a table in one region, the same table can be created in other regions also with same name.
If you want your table to be replicated in other regions you must enable the Cross-Region replication. For more details Refer
DynamoDB
All Things about DynamoDB
Almost every AWS service revolves around two things in availability: Multi AZ (multiple data centers in a single region) and Cross-Region (different geographic locations across globe) and so does the DynamoDB. By default AWS DynamoDB is a multi-AZ enabled service which means that your data is by default replicated across 3 data centers (minimum of 2 AZs) but for cross-region, you need to enable DynamoDB global tables (DynamoDB Streams).
Multi-Region Replication with DynamoDB
DynamoDB global tables are geographically distributed. They provide a fully managed solution for deploying a multiregion, multi-active database. Like with every other geographically distributed database, GlobalTables comes with ReplicationLatency.
An important thing to note here is, DynamoDB does not offer cross-region strong consistency (this is in contrast with CosmosDB, a similar offering from Azure)
From AWS documentation:
An application can read and write data to any replica table. If your
application only uses eventually consistent reads and only issues
reads against one AWS Region, it will work without any modification.
However, if your application requires strongly consistent reads, it
must perform all of its strongly consistent reads and writes in the
same Region. DynamoDB does not support strongly consistent reads
across Regions. Therefore, if you write to one Region and read from
another Region, the read response might include stale data that
doesn't reflect the results of recently completed writes in the other
Region.
Also, global tables are not to be confused with global indexes. Global indexes get their name because they are used in fetching data across multiple DynamoDB partitions.
"Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability."
This is specifically referring to multi AZ structure of dynamo, this helps in achieving high availability of your table. eg. if one of availability zone is down you still will be able to access you table.
To answer "my data is going out of my country(if my country has only
1 Region)."
For multi region its not by default ON you need to use global tables and specify regions in which you want to replicate that means your data/table wont go in any other region till you specifically want it to be.
For more on global tables refer
https://aws.amazon.com/dynamodb/global-tables/
I have some 200 tables in Singapore region and I want all the tables in my Oregon region.
I want only tables same as created in Singapore.
with:
Partition key
Sort Key
Read capacity
Write capacity
GSI
LSI
is there any way to do it?
instead of doing it manually.
Thanks in advance
How did you create the 200 tables in the first place? Manually as well? If not, use the same script.
I am not aware of any existing DB export tool.
You could create a script though that fetches the vital information from each table and creates it again using the CLI: http://docs.aws.amazon.com/cli/latest/reference/dynamodb/
Alternatively: If groups of these databases are similarly structured, you could try writing a cloudformation template per group.
You can use the ListTable and Describe Table call to query all your tables in the Singapore region. You can then use the create table api in the new region.
In the AWS console, I can create a table in a given region, say Northern California. When I try to access that region from the client (I'm using the Faraday library written for clojure, but it probably doesn't matter), it can't find the resource. When I created the table from the client, I don't see it in Northern California: so I looked around and found it in Northern Virginia. All my reads and writes are fine, they just happen to go to Northern Virginia.
Now, I don't really care what region the tables are stored in, but does this mean that I always have to create my tables from the client to ensure that the writes are going to the correct place? Is there some way to set the defaults so that the region in which my data is stored is always in the same place, or will the database always know where to look based on my credentials?
Dynamodb is available in many regions and each region has its own endpoint. I think you can configure your clojure dynamodb client to use a proper endpoint based on the region. In your case I guess North Virginia is the default endpoint/region that your client uses.
for example in my java code I am doing something like this to initialize my client:
dynamo = new AmazonDynamoDBClient(credentials);
Region usWest2 = Region.getRegion(Regions.US_WEST_2);
dynamo.setRegion(usWest2);
You can search for region or endpoint in your clojure code and configure it.