DynamoDB sorted pull - amazon-web-services

So I'm working with DynamoDB. My primary key is some unique identifier and then one of the other columns is a timestamp columns. I'd like to pull just the first N events and then continue to pull later events if the client requests this.
I don't know if this is possible to do in Dynamo. Is it possible? How would I go about it? I can redesign the table if needed.
Right now, it seems like the only way to do this would be to run a query all the data in the table and then sort the results returned.

Related

AWS DAX and GSI negative caching after item creation

So I have the table when I defined one partition and one global secondary index.
Then I create item and query it both by partition key and GSI. It works fine, in both cases I could get my item successfully right after it created in the table.
Now I have added DAX between my application and dynamodb table, and using dax sdk client to retrieve data just like in the manual https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.client.run-application-dotnet.html
And I always getting into negative caching scenario for my GSI query.
Meanwhile, query by partition key works fine within DAX too.
I'm receiving negative response right until TTL for item is expired. Right after that delay I'm able to get my item by GSI. I have tried to delay my query by GSI to avoid negative caching but even 120 second delay is making no change.
I can't find any info about that case in the documentation and would be glad for any useful information.
I could change my table schema to avoid GSI but I believe there is solution.
I'd suggest including a Minimal, Reproducible Example of your code.
Interestingly, I can find nothing in the documentation that indicates DAX works or doesn't work with queries via a GSI or LSI. But since DAX sits in front of the entire DDB service (not just a specific table) and since a GSI is just special type of table. I'll assume DAX should indeed cache query results from a GSI.
If you write to DDB and then immediately try querying the GSI through DAX, it wouldn't surprise me to see nothing returned. GSI's are eventually consistent after all. Or if for instance you query the GSI through DAX to see if something exists before writing the item.
And given the way DAX works, you'd continue to get nothing back till the DAX Query TTL ends.
You should rethink you application design if you're reliant on a GSI to be strongly consistent.
For testing purposes, if you
write to DDB
query GSI directly till you get the item back
query GSI through DAX
I would expect to see the query through DAX always return your data. If that's not the case, I'd open a ticket with AWS.

How should I create this DynamoDB table for almost 100000 records?

I've got a CSV with nearly 100000 records in it and I'm trying to query this data in AWS DynamoDB, but I don't think I did it right.
When creating the table I added a key column airport_id, and then wrote a small console app to import all the records setting a new GUID for the key column for each record as I've only ever used relational databases before so I don't know if this is supposed to work differently.
The problem came when querying the data, as picking a record somewhere in the middle and querying it using the AWS SDK in .NET produced no results at all. I can only put this down to bad DB design on my part as it did get some results depending on the query.
What I'd like, is to be able to query the data, for example
Get all the records that have iso_country of JP (Japan) - doesn't find many
Get all the records that have municipality of Inverness - doesn't find any, the Inverness records are halfway through the set
There are records there, but I'm reckoning it does not have the right design in order to get it in a timely fashion.
Show should I create the DynamoDB table based on the below screenshot?

AWS DynamoDB sorting without partition key

I have a DynamoDB table with a partition key (UUID) with a few attributes (like name, email, created date etc). Created date is one of the attribute in the item and its format is YYYY-MM-DD. But now there is a requirement change - I have to sort it based on created date and bring the entire data (that is, I cannot just bring the data on a particular partition, but the whole data from all the partitions in a sorted fashion. I know this might take time as DynamoDB to fetch data from all the partitions and sort it after. My question is:
Is the querying possible with the current design? I can see that partition key is required in the query, this is why I am confused because I cannot give a partition key here.
Is there a better way to redesign the table for such a use case?
Thanks in advance.
As your table exists you couldn't change the structure now, and even if you wanted to you would be reliant on UUID as your partition key.
There is functionality however to create a global secondary index for your DynamoDB table.
By using a GSI you can rearrange your data representation to include the creation date as the partition key of your table instead.
The reason why partition keys are important is that data in DynamoDB data is distributed across multiple nodes, with each partition sharing the same node. By performing a query it is more efficient to only communicate with one partition, as there is no need to wait on the other partitions returning results.

create custom AWS cloudwatch metric with ID from Postgres table

I have an interesting problem I need to resolve. I have a table A in Postgres. This table is treated like a queue which has a set of tasks. ID is incremental id in Postgres.
I want to have a metric to contain current processed position (ID) and the max number of ID. Those two numbers are accumulating every second.
Is there an efficient way to do it ?
The easiest way on top of my head is to execute this SQL query every 10 seconds (varies):
select blablah from table then limit 1 order by asc
to get smallest id and use the same approach to get largest id.
But this command is expensive. Is there any better way to do this ?
When you insert a new record into the table, return the record ID. When you extract a record do the same. You could cache this in memory, a file, a different DB table, etc. Then run a scheduled task to post these values to CloudWatch as a custom metric.
Example (very simple) SQL statement to return the ID when inserting new records:
INSERT INTO table (name) OUTPUT Inserted.ID VALUES('bob');

How to query a Dynamo DB table without knowing the table name before runtime?

I want to query a Dynamo DB table based on an attribute UpdateTime such that I get the records which are updated in the last 24 hours. But this attribute is not an index in the table. I understand that I need to make this column as an index. But I do not know how do I write a query expression for this.
I saw this question but the problem is I do not know the table name on which I want to query before runtime.
To find out the table names in your DynamoDB instance, you can use the "ListTables" API: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ListTables.html.
Another way to view tables and their data is via the DynamoDB Console: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html.
Once you know the table name, you can either create an index with the UpdateTime attribute as a key or scan the whole table to get the results you want. Keep in mind that scanning a table is a costly operation.
Alternatively you can create a DynamoDB Stream that captures all of the changes to your tables: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html.