When reading DynamoDB DAX documentation - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.concepts.cluster.html
I noticed random alphanumeric character being used in the cluster endpoint. These are normally l6fzcv etc. Does anyone know what they stand for? Is it constant for DAX clusters in a region?
Example endpoint from docs: dax://my-cluster.l6fzcv.dax-clusters.us-east-1.amazonaws.com
I believe the random alphanumeric characters you are seeing in the cluster endpoint, such as "l6fzcv" in "my-cluster.l6fzcv.dax-clusters.us-east-1.amazonaws.com", are not a constant for DAX clusters in a region, nor do they have any specific meaning. They are simply unique identifiers that are generated by AWS to identify the specific DAX cluster. I believe the typical endpoint format for DAX cluster is: clustername.dax.region.amazonaws.com
Related
I'm trying to understand DynamoDB replication & failover strategies but do not find any articles on the web which clarifies them. I understand cross-region replication can be achieved by DynamoDB with Global Tables but I also understand this is a multi-active table setup, meaning there are multiple active tables and multiple replica table. Is there a setup with single-active table and multiple replicas? I briefly read about this in this article but do not find any mentions anywhere else including AWS documentation.
I'm also trying to understand failover strategies for both cases - Is there a DynamoDB Java Client which can failover across AZs in case of issues in one AZ for both reads & writes?
DynamoDB Global Tables are always active-active but you can treat it as active-passive if you prefer. Many people do. That's useful if you want to use features like condition expressions, transactions, or do any non-idempotent wheres where you could have the same item being written around the same time in both regions with the second write happening before the first replicates, because this would cause the first write to be effectively lost.
To do this you just route your write traffic to one region, and to failover you decide when it's time to write to another. The failover region is always happy to be an active region if you'll let it.
As for AZs, DynamoDB is a regional service meaning it crosses at least 3 AZs always and would keep operating fine even if a full AZ were to be down. You don't have to worry about that.
Is there a setup with single-active table and multiple replicas
Unfortunately there is no such single active and multiple replica setup for cross region in dynamodb using global tables, so failover strategy will be for multiple active tables and multiple replica tables! - Source - docs
Fro failover strategies
According to docs
If a single AWS Region becomes isolated or degraded, your application can redirect to a different Region and perform reads and writes against a different replica table.
This means this is seamingless smooth process which happends by default ofcourse you can add custom logic when to redirect
We're having hundreds of DynamoDB tables.
For the performance optimization, we're going to use DynamoDB Accelerator (DAX).
While exploring DAX, I came across two approaches.
A unified cache cluster, that can be used for all DynamoDB tables
Separate cluster for each DynamoDB table
At a first glance, it seems #2 may be better because of isolation of individual clusters, as none of DynamoDB table's cluster will affect other table's cluster. However, manageability may be bit complex!
Is that correct OR am I missing anything? Which approach would be better and why?
Finally, We have used synthesis of both the approaches to get the merit of both approaches. Sharing it if it can help others!
To elaborate, multiple clusters are being created, and each cluster has been used for different set of DynamoDB tables.
Last note, remember that only one node from cluster handles write operation to DynamoDB and rest of nodes are just read replicas. Hence, while determining set of tables for a cluster, it should be considered.
This is the error we get in Athena: HIVE_UNKNOWN_ERROR: Error creating an instance of com.facebook.presto.hive.lakeformation.CachingLakeFormationCredentialsProvider
The bucket is registered with Lake Formation
Role used for querying Athena has been given full access in Lake Formation to the database and all the tables in the database
Role has been given access to the underlying s3 bucket in the Data Locations section of Lake Formation.
Contacted AWS support. Turns out the problem was that I had "-" and "." in my Athena database name. According to Athena documentation:
"The only acceptable characters for database names, table names, and column names are lowercase letters, numbers, and the underscore character." (https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html#schema-names)
For some reason this was not a problem when we were working outside the Lake Formation, but as soon as we registered the S3 location in LF, it started failing. I have confirmed that removing those characters from the database name solves the problem.
Make sure you included the slash (/) behind the bucket name
Ultimately, I would like to obtain a list of tables in a particular schema that haven't been queried in the last two weeks (say).
I know that there are many system tables that track various things about how the Redshift cluster is functioning, but I have yet to find one that I could use to obtain the above.
Is what I want to do possible?
Please have a look at our "Unscanned Tables" query: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/unscanned_table_summary.sql
If you have enabled audit logging for the cluster, activity data stored inside a S3 bucket which you configured while enabling logging.
According to AWS Documentation, audit log bucket structure is as follows.
AWSLogs/AccountID/ServiceName/Region/Year/Month/Day/AccountID_ServiceName_Region_ClusterName_LogType_Timestamp.gz
For example: AWSLogs/123456789012/redshift/us-east-1/2013/10/29/123456789012_redshift_us-east-1_mycluster_userlog_2013-10-29T18:01.gz
I had created a simple table in dynamo called userId, I could view it in the AWS console and query it through some java on my local machine. This morning, however, I could no longer see the table in the dynamo dashboard but I could still query it through the java. The dashboard showed no tables at all (I only had one, the missing 'userId'). I then just created a new table using the dashboard, called it userId and populated it. However, now when I run my java to query it, the code is returning the items from the missing 'userId' table, not this new one! Any ideas what is going on?
Ok, that's strange. I thought dynamo tables were not specified by region but I noticed once I created this new version of 'userId' it was viewable under the eu-west region but then I could see the different (previously missing!) 'userId' table in the us-east region. They both had the same table name but contained different items. I didn't think this was possible?
Most of the services of Amazon Web Services are in a single region. The only exceptions are Route 53 (DNS), IAM, and CloudFront (CDN). The reason is that you want to control the location of your data, mainly for regulatory reasons. Many times your data can't leave the US or Europe or any other region.
It is possible to create high availability for your services within a single region with availability zones. This is how the highly available services as DynamoDB or S3 are giving such functionality, by replicating the data between availability zones, but within a single region.