How to query a Dynamo DB table without knowing the table name before runtime? - amazon-web-services

I want to query a Dynamo DB table based on an attribute UpdateTime such that I get the records which are updated in the last 24 hours. But this attribute is not an index in the table. I understand that I need to make this column as an index. But I do not know how do I write a query expression for this.
I saw this question but the problem is I do not know the table name on which I want to query before runtime.

To find out the table names in your DynamoDB instance, you can use the "ListTables" API: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ListTables.html.
Another way to view tables and their data is via the DynamoDB Console: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html.
Once you know the table name, you can either create an index with the UpdateTime attribute as a key or scan the whole table to get the results you want. Keep in mind that scanning a table is a costly operation.
Alternatively you can create a DynamoDB Stream that captures all of the changes to your tables: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html.

Related

Materialized view on firebase events data in big query

I have a table in bigquery which fetches data from my firebase events analytics. The table is made in such a way that it appears to create a new table based on the date. But inside the databaseUI it appears as a single table. It changes when I click on it.
For example: Today's table name would be ‘tablename’_<today’s date> and when I use the filter option available in the big query UI for this particular table type. The name changes to ‘tablename'_'selecteddate'.
I want to create a materialized view on the complete/ combine table of all dates. Not just on a table with a particular date.
How can I do it? Can someone please guide me?
The columns I want in Materialized View is even_date, event_name, event_params( key and value) , count of event_params (of key and value both).
I am unable to find a way. So is it possible?
Thank you

Getting unique attributes from dynamoDB table

I am working on a backfill issue where I need to fetch all the unique values for an attribute in a dynamo db table and call a service to add these to the storage of that service. I am thinking of creating a temporary dynamo db table. I can read the original table in a lambda function and write only the unique values in the temp table. Is there any other approach possible?
The dynamo DB table has approximately 1,400,000 rows.
1,400,000 records is not that many. Probably you can just read the table.
You can improve the read by making your attribute a global secondary key. It need not be unique. Then you can read only the attribute in question or check uniqueness.
If the records in your table are constantly updated, you can listen to the DynamoDB update stream and just update your temporary table with the new values.
Using the single table pattern https://www.youtube.com/watch?v=EOQqi6Yun7g - your "temporary" table can be just a different primary key prefix.
If you have to scan the table and the process is too long, you can split it to multiple lambda calls by passing around the LastEvaluatedKey value (e.g. with a step machine).
You can scan the whole table, using projection expression fetch only the relevant columns and extract unique values.
One more approach can be, you can take a backup of DynamoDB table to S3 and then process the S3 file to extract unique column values.

BigQuery Cannot Modify Partitioned Table Schema

Per the BigQuery documentation I am attempting to modify a table's schema by adding a field. The table in question is a partition slice (partitioned by day). I am planning on performing the action on every slice.
Per the documentation (https://cloud.google.com/bigquery/docs/managing-partitioned-tables), I should be able to add field to a partitioned table like any other table. However whenever I attempt to add a field to a partitioned table, I am met with this error:
Could not edit table schema.: Cannot change partitioned/clustered table to non partitioned/clustered table.
I am not able to find any good information on what this error means, or what I'm doing wrong. I have successfully added a field to a non-partitioned table. Does the community have any good ideas to help me troubleshoot?
I understand that you are using the update_table method to update the schema in python, correct me if I'm wrong. You have to do it with the patch API you can try this API to have a better view on how to do it.

DynamoDB table within table

Is there any way to create a table inside a table with DynamoDB? I have a table that I expect to hold a lot of other information, and another table inside could be useful.
A DynamoDB table can have a list of maps as an attribute, so you could store your JSON objects as native lists/maps within the table. However, if you're appending frequently, keep in mind that the maximum item size in DynamoDB is 400 KB, so you may be better off having a separate table and "joining" on it.

How to select a partition key for for a DynamoDB query?

I have created a dynamo db table with name- "sample".It has below columns. CreatedDate will have creation time of any records inserted to this table.
Itemid,
ItemName,
ItemDescription,
CreatedDate,
UpdatedDate
I am creating a python-flask based rest api which always fetches last 100 records inserted to this table. This API (python-flask function) does not have any input parameters. It should just return the last records inserted to this table.
Question 1
What should be the partition key for this table? I am using the boto3 library to fetch records from DynamoDB. I prefer not to do scan operation because it may cause performance issues. If I use the query function it asks for a partition key. Since this rest API does not accept any input I am not sure how to use it.
Question 2
Has anyone faced similar situation? And what was done to fix this?
Note: I am pretty much newbie to DynamoDB, NoSQL and Boto
To query your table using CreatedDate without knowing the ItemId, you can use Global Secondary Index write sharding by adding an attribute (e.g., ShardId) containing a (0-N) value to every item that you will use for the global secondary index partition key.
Depending on how your items are distributed against CreatedDate, you can set the ShardId so that it is likely to have evenly distributed access patterns. For example: YYYY, YYYYMM or YYYYMMDD. Then, you create a global secondary index with ShardId as an index partition key and CreatedDate as an index sort key.
Knowing the primary key for your GSI (since the ShardId value is derived from CreatedDate), you can query the table for the 100 most recent items with query's Limit parameter (or LastEvaluatedKey if your items set size is larger than 1 MB of data).
See Using Global Secondary Index Write Sharding for Selective Table Queries.