I am using DynamoDB for my backend database operations. One of my tables contains column name is 'Region'. When I am scanning this table, I applied a filter with Region. That time DynamoDB is throwing an error message. 'Region' is a keyword of DynamoDB.
How can I change column name Region to State?
You don't really have to change the column name. You can do this using placeholders:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ExpressionPlaceholders.html
Related
I'm trying to set up a Postgresql migration using the DMS to s3 as target. But after running I noticided that some tables were missing some columns.
After checking the logs I noticed this message:
Column 'column_name' was removed from table definition 'schema.table': the column data type is LOB and the table has no primary key or unique index
In the settings of the task migration I tried to increase the lob limit in the option
Maximum LOB size to 2000000
But still getting the same result.
Does anyone know a workaround for this problem?
I guess, the problem is you do not have the primary key in your table.
From AWS documentation:
Currently, a table must have a primary key for AWS DMS to capture LOB
changes. If a table that contains LOBs doesn't have a primary key,
there are several actions you can take to capture LOB changes:
Add a primary key to the table. This can be as simple as adding an ID
column and populating it with a sequence using a trigger.
Create a materialized view of the table that includes a
system-generated ID as the primary key and migrate the materialized
view rather than the table.
Create a logical standby, add a primary key to the table, and migrate
from the logical standby.
Learn more
It is also important to have the primary key of a simple type, not LOB:
In FULL LOB or LIMITED LOB mode, AWS DMS doesn't support replication of primary keys that are LOB data types.
Learn more
I am working on a backfill issue where I need to fetch all the unique values for an attribute in a dynamo db table and call a service to add these to the storage of that service. I am thinking of creating a temporary dynamo db table. I can read the original table in a lambda function and write only the unique values in the temp table. Is there any other approach possible?
The dynamo DB table has approximately 1,400,000 rows.
1,400,000 records is not that many. Probably you can just read the table.
You can improve the read by making your attribute a global secondary key. It need not be unique. Then you can read only the attribute in question or check uniqueness.
If the records in your table are constantly updated, you can listen to the DynamoDB update stream and just update your temporary table with the new values.
Using the single table pattern https://www.youtube.com/watch?v=EOQqi6Yun7g - your "temporary" table can be just a different primary key prefix.
If you have to scan the table and the process is too long, you can split it to multiple lambda calls by passing around the LastEvaluatedKey value (e.g. with a step machine).
You can scan the whole table, using projection expression fetch only the relevant columns and extract unique values.
One more approach can be, you can take a backup of DynamoDB table to S3 and then process the S3 file to extract unique column values.
I have a table defined in AWS Glue. I use AWS Kinesis streams to stream logs into S3 using this table definition, using parquet file format. It's partitioned by date.
One of the fields in the table is a struct with several fields, event_payload, one of them an array of structs. Recently I added a new field to the inner struct in the log data. I want to add it in the table definition so that it will be written to the S3, and so that I can query it using AWS Athena.
I tried editing the table schema directly in the console. It does write the data to S3, but I get an exception in Athena when querying:
HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'event_payload' in table 'c2s.logs' is declared as type 'struct<...>', but partition 'year=2019/month=201910/day=20191026/hour=2019102623' declared column 'event_payload' as type 'struct<...>'.
I tried deleting all the partitions and repairing the table, as specified here, but I got another error:
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://.../year=2019/month=201910/day=20191022/hour=2019102216/beaconFirehose-4-2019-10-22-16-34-21-71f183d2-207e-4ae9-98fe-07dda0bab70c.parquet (offset=0, length=801679): Schema mismatch, metastore schema for row column event_payload.markings.element has 8 fields but parquet schema has 7 fields
So the schema has a field which is not present in the data.
Is there a way to specify an optional field? If it's not present, just make it null.
As per link schema updates on nested structure is not supported in Athena. One way to make this work can be to flatten the struct type with the help of relalaionalize operator in Glue. for ex:
val frames: Seq[DynamicFrame] = lHistory.relationalize(rootTableName = "hist_root", stagingPath = redshiftTmpDir, JsonOptions.empty)
I have set of data: id, name, height and weight.
I am sending this data to aws iot in json format. From there I need to update the respective columns in a dynamo db hence I have created 3 rules to update name, height and weight keeping id as partition key.
But when I send the message only one column is getting updated. If I disable any 2 rules then the remaining rule works fine. Therefore every time I update, columns are getting overwritten.
How can I update all three columns from the incoming message?
Another answer: in your rule, use instead the "dynamoDBv2" action -- which "allows you to write all or part of an MQTT message to a DynamoDB table. Each attribute in the payload is written to a separate column in the DynamoDB database ..."
dynamoDBv2 action: writes each attribute in the payload to a separate column in the DynamoDB database.
The answer is: You can't do this with the IoT gateway rules themselves. You can only store data in a single column through the rules (apart from the hash and sort key).
A way around this is to make a lambda rule which calls for example a python script which then takes the message and stores it in the table. See also this other SO question.
I want to query a Dynamo DB table based on an attribute UpdateTime such that I get the records which are updated in the last 24 hours. But this attribute is not an index in the table. I understand that I need to make this column as an index. But I do not know how do I write a query expression for this.
I saw this question but the problem is I do not know the table name on which I want to query before runtime.
To find out the table names in your DynamoDB instance, you can use the "ListTables" API: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ListTables.html.
Another way to view tables and their data is via the DynamoDB Console: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html.
Once you know the table name, you can either create an index with the UpdateTime attribute as a key or scan the whole table to get the results you want. Keep in mind that scanning a table is a costly operation.
Alternatively you can create a DynamoDB Stream that captures all of the changes to your tables: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html.