DynamoDB: UpdatedDate as a Sort Key on a GSI? - amazon-web-services

As stated in this question, I've assumed that you can't have something like updated date as the sort key of a table, because if you update you will create a duplicate record.
Further, I've always assumed that the same thing applied to a GSI using updated date. But in my scenario I have the updated date as a sort key on a GSI, and no new records are created when I update the original item.
To recap, the attributes and key schema are:
Attributes:
Id
MySortKey
MyComputedField
UpdatedDate
Table:
PartitionKey: Id
SortKey: MySortKey
GSI:
PartitionKey: MyComputedField
SortKey: UpdatedDate
My question is, am I indirectly affecting the performance of the index by doing this? Or are there any other issues caused by this pattern that I'm not aware of?

Global Secondary Indexes are separate tables under the hood and changed items from the primary table are replicated to it.
As you observed correctly, you can use a changing attribute as the sort key in a GSI without that resulting in duplicates once you write to the base table.
Note, that there is no guarantee of uniqueness in the GSI, i.e. you can have more than one item with the same key attributes.
In addition to that you can only do eventually consistent reads from GSIs.
GSIs also have their own read and write capacity units that you need to provision and if you change items in the base table that need to be replicated, the operation will consume write capacity units on the GSI.
Reads are separate from that.
The RCUs on the GSI remain unaffected from the writes to the table.
But if you often change items, you may see some inconsistencies for a very brief period of time (that's why only eventually consistent reads are possible).
That means you can use the patterns if you can live with the side effects I mentioned.

Related

What should be DynamoDB key schema for time sliding window data get scenario?

What I never understood about DynamoDB is how to design a table to effectively get all data with one particular field lying in some range. For example, time range - we would like to get data created from timestamp1 up to timestamp2. According to keys design, we can use only sort key for such a purpose. However, it automatically means that the primary key should be the same for all data. But according to documentation, it is an anti-pattern of DynamoDB usage. How to deal with the situation? Could be creating evenly distributed primary key and then a secondary key which primary part is the same for all items but sort part is different for all of them be a better solution?
You can use Global Secondary Index which in essence is
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table.
So you can query on other attributes that are unique.
I.e. as it might not be clear what I meant, is that you can choose something else as primary key that is possible to be unique and use a repetetive ID as GSI on which you are going to base your query.
NOTE: One of the widest applications of NoSQL DBs is to store timeseries, which you cannot expect to have a unique identifier as PK, unless you specify the timestamp.

DynamoDB partition key choice for notes app

I want to create a DynamoDB table that allows me to save notes from users.
The attributes I have:
user_id
note_id (uuid)
type
text
The main queries I will need:
Get all notes of a certain user
Get a specific note
Get all notes of a certain type (the less used query)
I know that in terms of performance and DynamoDB partitions note_id would be the right choice because they are unique and would be distributed equally over the partitions but on the other hand is much harder to get all notes of a user without scanning all items or using a GSI. And if they are unique I suppose it doesn't make any sense to have a sort key.
The other option would be to use user_id as partition key and note_id as sort key, but if I have certain users that are a much larger number of notes than others wouldn't that impact my performance?
Is it better to have a partition key unique (like note_id) to scale well with DynamoDB partitions and use GSIs to create my queries or to use instead a partition key for my main query (user_id)?
Thanks
Possibly the simplest and most cost-effective way would be a single table:
Table Structure
note_id (uuid) / hash key
user_id
type
text
Have two GSIs, one for "Get all notes of a certain user" and one for "Get all notes of a certain type (the less used query)":
GSI for "Get all notes of a certain user"
user_id / hash key
note_id (uuid) / range key
type
text
A little note on this - which of your queries is the most frequent: "Get all notes of a certain user" or "Get a specific note"? If it's the former, then you could swap the GSI keys for the table keys and vice-versa (if that makes sense - in essence, have your user_id + note_id as the key for your table and the note_id as the GSI key). This also depends upon how you structure your user_id - I suspect you've already picked up on; make sure your user_id is not sequential - make it a UUID or similar.
GSI for "Get all notes of a certain type (the less used query)"
type / hash key
note_id (uuid) / range key
user_id
text
Depending upon the cardinality of the type field, you'll need to test whether a GSI will actually be of benefit here or not.
If the GSI is of little benefit and you need more performance, another option would be to store the type with an array of note_id in a separate table altogether. Beware of the 400k item limit with this one and the fact that you'll need to perform another query to get the text of the note.
With this table structure and GSIs, you're able to make a single query for the information you're after, rather than making two if you have two tables.
Of course, you know your data best - it's best to start with what you think is best and then test it to ensure it meets what you're looking for. DynamoDB is priced by provisioned throughput + the amount of indexed data stored so creating "fat" indexes with many attributes projects, as above, if there is a lot of data then it could become more cost effective to perform two queries and store less indexed data.
I would use user_id as your primary partition(hash) key and note_id as your primary range(sort) key.
You have already noted that in an ideal situation, each partition key is accessed with equal regularity to optimise performance see Design For Uniform Data Access Across Items In Your Tables. The use of user_id is perfectly fine as long as you have a good spread of users who regularly log in. Indeed AWS specifically encourage this option (see 'Choosing a Partition Key' table in the link above).
This approach will also make your application code much simpler than your alternative approach.
You then have a second choice which is whether to apply a Global Secondary Index for your get notes by type query. A GSI key, unlike a primary key, does not need to be unique (see AWS GSI guide, therefore I suggest you would simply use type as your GSI partition key without a range key.
The obvious plus side to using a GSI is a faster result when you perform the note type query. However you should be aware of the downsides also. A GSI has a separate throughput allowance than your table, so you need to provision this in addition to your table throughput (at extra cost). If you dont provision your GSI with enough read units it could end up slower than a scan on your table. If you dont provision enough write units, your table writes could be throttled, even if your table had enough write units.
Also, AWS warn that GSIs are updated asynchronously (usually within a fraction of a second but it can be longer). This means queries on your GSI might return the 'wrong' result if you have table writes and index reads very close together. If this was a problem you would need to handle it in your application code.
I see this as 2 tables. User and notes with a GSI on the notes table. Not sure how else you could do it. Using userId as primary key and note_id as sort key requires that you can only retrieve elements when you know both the user_id and the note_id. With DynamoDB if your not scanning you have to satisfy all the elements in the primary key, so both the partition and and sort if there is one. Below is how I would do this.
Get all notes of a certain user
When a user creates a note I would add this to the users table in the users notes attribute. When you want to get all of a users notes then retrieve the user and access the array/list of note_ids stored there.
{ userId: xxx,
notes: [ note_id_1,note_id_2,note_id_3]
}
Get a specific note
A notes table with node_id as the primary key would make that easy.
{
noteId: XXXX,
note: "sfsfsfsfsfsf",
type: "standard_note"
}
Get all notes of a certain type (the less used query)
I would use a GSI on the notes table for this with the attributes of "note_type" and note_id projected onto it.
Update
You can pull this off with one table and a GSI (See the two answers below for how) but I would not do it. Your data model is so simple why make it more complicated than users and notes.

DynamoDB ConsistentRead for Global Indexes

I have next table structure:
ID string `dynamodbav:"id,omitempty"`
Type string `dynamodbav:"type,omitempty"`
Value string `dynamodbav:"value,omitempty"`
Token string `dynamodbav:"token,omitempty"`
Status int `dynamodbav:"status,omitempty"`
ActionID string `dynamodbav:"action_id,omitempty"`
CreatedAt time.Time `dynamodbav:"created_at,omitempty"`
UpdatedAt time.Time `dynamodbav:"updated_at,omitempty"`
ValidationToken string `dynamodbav:"validation_token,omitempty"`
and I have 2 Global Secondary Indexes for Value(ValueIndex) filed and Token(TokenIndex) field. Later somewhere in the internal logic I perform the Update of this entity and immediate read of this entity by one of this indexes(ValueIndex or TokenIndex) and I see the expected problem that data is not ready(I mean not yet updated). I can't use ConsistentRead for this cases, because this is Global Secondary Index and it doesn't support this options. As a result I can't run my load tests over this logic, because data is not ready when tests go in 10-20-30 threads. So my question - is it possible to solve this problem somewhere? or should I reorganize my table and split it to 2-3 different tables and move filed like Value, Token to HASH key or SORT key?
GSIs are updated asynchronously from the table they are indexing. The updates to a GSI typically occur in well under a second. So, if you're after immediate read of a GSI after insert / update / delete, then there is the potential to get stale data. This is how GSIs work - nothing you can do about that. However, you need to be really mindful of three things:
Make sure you keep your GSI lean - that is, only project the absolute minimum attributes that you need. Less data to write will make it quicker.
Ensure that your GSIs have the correct provisioned throughput. If it doesn't, it may not be able to keep up with activity in the table and therefore you'll get long delays in the GSI being kept in sync.
If an update causes the keys in the GSI to be updated, you'll need 2 units of throughput provisioned per update. In essence, DynamoDB will delete the item then insert a new item with the keys updated. So, even though your table has 100 provisioned writes, if every single write causes an update to your GSI key, you'll need to provision 200 write units.
Once you've tuned your DynamoDB setup and you still absolutely cannot handle the brief delay in GSIs, you'll probably need to use different technology. For example, even if you decided to split your table into multiple tables, it'll have the same (if not worse) impact. You'll update one table, then try to read the data from another table and you haven't yet inserted the values into a different table.
I suspect that once you tune DynamoDB for your situation, you'll get pretty damn close you what you want.

Guidelines for creating GSI in DynamoDB

Imagine that you need to persist something that can be represented with following schema:
{
type: String
createdDate: String (ISO-8601 date)
userId: Number
data: {
reference: Number,
...
}
}
type and createdDate are always defined/required, everything else such as userId, data and whatever fields within data are optional. Combination of type and createdDate does not guarantee any uniqueness. Number of fields within data (when data exists) may differ.
Now imagine that you need to query against this structure like:
Give me items where type is equal to something
Give me items where userId is equal to something
Give me items where type AND userId are equal to something
Give me items where userId AND data.reference are equal to something
Give me items where userId is equal to something, where type IS IN range of values and where data.reference is equal to something
As it seems to me HashKey needs to be introduced on table level to uniquely match the item. Only choice that i have is to use something like UUID generator. Based on that i can't query anything from table that i need described above. So i need to create several global secondary indexes to cover all fifth cases above as follows:
For first use case i could create GSI where type can be HashKey and createdDate can be RangeKey.What bothers me from start here as i mentioned, there is high chance for this composite key to NOT be unique.
For second use case i could crate GSI where userId can be HashKey and createdDate can be RangeKey
Here probably this composite key can match item uniquely.
For third use case, i have probably two solutions. Either to create third GSI where type can be HashKey and userId can be RangeKey. With that approach i'm losing ability to sort returned data and again same worries, this composite key does not guarantee uniqueness. Another approach would be to use one of two previous GSIs and using FilterExpression, right?
For fourth use case i have only one option. To use previous GSI with userId as HashKey and createdDate as a RangeKey and to use FilterExpression against data.reference. Index can't be created on fields from nested object right?
For fifth use case, because IN operator is only supported via FilterExpression (right?) only option again is to use GSI with userId as HashKey and createdDate as a RangeKey and to use FilterExpression for both type and data.reference?
So as only bright side of this problem i see using GSI with userId as HashKey and createdDate as RangeKey. But again userId is not mandatory field it can be NULL. HashKey can't be NULL right?
Most importantly, if composite key(HashKey and RangeKey) can't guarantee uniqueness that means that saving item with composite key that already exists in index will silently rewrite previous item which means i will lose the data?
The thing about DynamoDB: it is a no-SQL database. On the plus side, it is easy to dump pretty much anything into it so long as you have a unique index and it will be fairly efficiently stored for retrieve if you have a good partition key that sub-divides your data into chunks. On the downside, any query you do against fields that are not the partition key or index (primary or secondary) are slow table scans by definition. DynamoDB is not an SQL database and cannot give SQL-like performance when filtering non-indexed columns. If the performance you see is going to be reasonable, you need to delimit your query results to pre-calculated index values available before doing a query or you need to know the results you are looking for are delimited to a few partition keys.
First let's consider the delimited partition keys route. Once you have delimited the partition keys as much as you can and there are no more indexes to reference, everything else you ask DynamoDB is not really a query, but a table scan. You can ask DynamoDB to do it for you, but you may well be better off taking the full results from a partition key or index query and doing the filter yourself in whatever language you are using. I use Java for this purpose because it is simple to do a query for the keys I need through the Java->DynamoDB API and easy to then filter the results in Java. If this is interesting to you I can put together some simple examples.
If you go the index and filter route, understand that the hash key is still a partition key for the index, which is going to determine how much the GSI can be used in parallel. The bigger your DynamoDB table becomes and the more time sensitive your queries are, the bigger the issue this will become.
So yes, you can make the queries you want with indexes, though it will take some careful planning of those indexes.
1. For first use case i could create GSI where type can be HashKey and
createdDate can be RangeKey.What bothers me from start here as i
mentioned, there is high chance for this composite key to NOT be
unique.
GSI's do not have to be unique. You will receive multiple rows on a query, but nothing will be broken from DynamoDB's perspective. However, if you use type as your partition key (HashKey), the performance of this query will likely be poor unless you have few records for each of your type values.
2. For second use case i could crate GSI where userId can be HashKey and
createdDate can be RangeKey Here probably this composite key can match item
uniquely.
No problems here so long as your userId's will be unique on a given day.
3. For third use case, i have probably two solutions. Either to create third
GSI where type can be HashKey and userId can be RangeKey. With that approach
i'm losing ability to sort returned data and again same worries, this
composite key does not guarantee uniqueness. Another approach would be to
use one of two previous GSIs and using FilterExpression, right?
So the RangeKey is your sort key, at least from DynamoDB's perspective. And yes, if you use a GSI and then Filter, you are table scanning the contents of the GSI indexed rows. But yes, if you are combining two GSI's, you either generate a third index in advance or you filter/scan. DynamoDB has no provisions for doing an INNER JOIN on two indexes. And having type as your partition key and then filtering the results has serious performance issues.
4. For fourth use case i have only one option. To use previous GSI with
userId as HashKey and createdDate as a RangeKey and to use FilterExpression
against data.reference. Index can't be created on fields from nested object
right?
I am not sure about your nested object question, but yes, using the previous GSI with a filter/scan will work.
5. For fifth use case, because IN operator is only supported via
FilterExpression (right?) only option again is to use GSI with userId as
HashKey and createdDate as a RangeKey and to use FilterExpression for both
type and data.reference?
Yes, if you want DynamoDB to do the work for you, this is the way to approach your fifth query. But I go back to my original statement: why do this? If you can create a GSI that efficiently gets you to the records you are interested in, use a GSI. But when I never use filter expressions: I get the full partition, index or GSI results back from a query and do the filtering myself in my programming language of choice.
If you need to do everything in DynamoDB your methods will work, but they may not be very fast depending on how many rows are being filtered. I beat on the performance issue pretty hard because I have seen lots of work go into s database project and then had the whole thing not get used because poor performance made it unusable.

Indexing notifications table in DynamoDB

I am going to implement a notification system, and I am trying to figure out a good way to store notifications within a database. I have a web application that uses a PostgreSQL database, but a relational database does not seem ideal for this use case; I want to support various types of notifications, each including different data, though a subset of the data is common for all types of notifications. Therefore I was thinking that a NoSQL database is probably better than trying to normalize a schema in a relational database, as this would be quite tricky.
My application is hosted in Amazon Web Services (AWS), and I have been looking a bit at DynamoDB for storing the notifications. This is because it is managed, so I do not have to deal with the operations of it. Ideally, I'd like to have used MongoDB, but I'd really prefer not having to deal with the operations of the database myself. I have been trying to come up with a way to do what I want in DynamoDB, but I have been struggling, and therefore I have a few questions.
Suppose that I want to store the following data for each notification:
An ID
User ID of the receiver of the notification
Notification type
Timestamp
Whether or not it has been read/seen
Meta data about the notification/event (no querying necessary for this)
Now, I would like to be able to query for the most recent X notifications for a given user. Also, in another query, I'd like to fetch the number of unread notifications for a particular user. I am trying to figure out a way that I can index my table to be able to do this efficiently.
I can rule out simply having a hash primary key, as I would not be doing lookups by simply a hash key. I don't know if a "hash and range primary key" would help me here, as I don't know which attribute to put as the range key. Could I have a unique notification ID as the hash key and the user ID as the range key? Would that allow me to do lookups only by the range key, i.e. without providing the hash key? Then perhaps a secondary index could help me to sort by the timestamp, if this is even possible.
I also looked at global secondary indexes, but the problem with these are that when querying the index, DynamoDB can only return attributes that are projected into the index - and since I would want all attributes to be returned, then I would effectively have to duplicate all of my data, which seems rather ridiculous.
How can I index my notifications table to support my use case? Is it even possible, or do you have any other recommendations?
Motivation Note: When using a Cloud Storage like DynamoDB we have to be aware of the Storage Model because that will directly impact
your performance, scalability, and financial costs. It is different
than working with a local database because you pay not only for the
data that you store but also the operations that you perform against
the data. Deleting a record is a WRITE operation for example, so if
you don't have an efficient plan for clean up (and your case being
Time Series Data specially needs one), you will pay the price. Your
Data Model will not show problems when dealing with small data volume
but can definitely ruin your plans when you need to scale. That being
said, decisions like creating (or not) an index, defining proper
attributes for your keys, creating table segmentation, and etc will
make the entire difference down the road. Choosing DynamoDB (or more
generically speaking, a Key-Value store) as any other architectural
decision comes with a trade-off, you need to clearly understand
certain concepts about the Storage Model to be able to use the tool
efficiently, choosing the right keys is indeed important but only the
tip of the iceberg. For example, if you overlook the fact that you are
dealing with Time Series Data, no matter what primary keys or index
you define, your provisioned throughput will not be optimized because
it is spread throughout your entire table (and its partitions) and NOT
ONLY THE DATA THAT IS FREQUENTLY ACCESSED, meaning that unused data is
directly impacting your throughput just because it is part of the same
table. This leads to cases where the
ProvisionedThroughputExceededException is thrown "unexpectedly" when
you know for sure that your provisioned throughput should be enough for your
demand, however, the TABLE PARTITION that is being unevenly accessed
has reached its limits (more details here).
The post below has more details, but I wanted to give you some motivation to read through it and understand that although you can certainly find an easier solution for now, it might mean starting from the scratch in the near future when you hit a wall (the "wall" might come as high financial costs, limitations on performance and scalability, or a combination of all).
Q: Could I have a unique notification ID as the hash key and the user ID as the range key? Would that allow me to do lookups only by the range key, i.e. without providing the hash key?
A: DynamoDB is a Key-Value storage meaning that the most efficient queries use the entire Key (Hash or Hash-Range). Using the Scan operation to actually perform a query just because you don't have your Key is definitely a sign of deficiency in your Data Model in regards to your requirements. There are a few things to consider and many options to avoid this problem (more details below).
Now before moving on, I would suggest you reading this quick post to clearly understand the difference between Hash Key and Hash+Range Key:
DynamoDB: When to use what PK type?
Your case is a typical Time Series Data scenario where your records become obsolete as the time goes by. There are two main factors you need to be careful about:
Make sure your tables have even access patterns
If you put all your notifications in a single table and the most recent ones are accessed more frequently, your provisioned throughput will not be used efficiently.
You should group the most accessed items in a single table so the provisioned throughput can be properly adjusted for the required access. Additionally, make sure you properly define a Hash Key that will allow even distribution of your data across multiple partitions.
The obsolete data is deleted with the most efficient way (effort, performance and cost wise)
The documentation suggests segmenting the data in different tables so you can delete or backup the entire table once the records become obsolete (see more details below).
Here is the section from the documentation that explains best practices related to Time Series Data:
Understand Access Patterns for Time Series Data
For each table that you create, you specify the throughput
requirements. DynamoDB allocates and reserves resources to handle your
throughput requirements with sustained low latency. When you design
your application and tables, you should consider your application's
access pattern to make the most efficient use of your table's
resources.
Suppose you design a table to track customer behavior on your site,
such as URLs that they click. You might design the table with hash and
range type primary key with Customer ID as the hash attribute and
date/time as the range attribute. In this application, customer data
grows indefinitely over time; however, the applications might show
uneven access pattern across all the items in the table where the
latest customer data is more relevant and your application might
access the latest items more frequently and as time passes these items
are less accessed, eventually the older items are rarely accessed. If
this is a known access pattern, you could take it into consideration
when designing your table schema. Instead of storing all items in a
single table, you could use multiple tables to store these items. For
example, you could create tables to store monthly or weekly data. For
the table storing data from the latest month or week, where data
access rate is high, request higher throughput and for tables storing
older data, you could dial down the throughput and save on resources.
You can save on resources by storing "hot" items in one table with
higher throughput settings, and "cold" items in another table with
lower throughput settings. You can remove old items by simply deleting
the tables. You can optionally backup these tables to other storage
options such as Amazon Simple Storage Service (Amazon S3). Deleting an
entire table is significantly more efficient than removing items
one-by-one, which essentially doubles the write throughput as you do
as many delete operations as put operations.
Source:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.TimeSeriesDataAccessPatterns
For example, You could have your tables segmented by month:
Notifications_April, Notifications_May, etc
Q: I would like to be able to query for the most recent X notifications for a given user.
A: I would suggest using the Query operation and querying using only the Hash Key (UserId) having the Range Key to sort the notifications by the Timestamp (Date and Time).
Hash Key: UserId
Range Key: Timestamp
Note: A better solution would be the Hash Key to not only have the UserId but also another concatenated information that you could calculate before querying to make sure your Hash Key grants you even access patterns to your data. For example, you can start to have hot partitions if notifications from specific users are more accessed than others... having an additional information in the Hash Key would mitigate this risk.
Q: I'd like to fetch the number of unread notifications for a particular user.
A: Create a Global Secondary Index as a Sparse Index having the UserId as the Hash Key and Unread as the Range Key.
Example:
Index Name: Notifications_April_Unread
Hash Key: UserId
Range Key : Unuread
When you query this index by Hash Key (UserId) you would automatically have all unread notifications with no unnecessary scans through notifications which are not relevant to this case. Keep in mind that the original Primary Key from the table is automatically projected into the index, so in case you need to get more information about the notification you can always resort to those attributes to perform a GetItem or BatchGetItem on the original table.
Note: You can explore the idea of using different attributes other than the 'Unread' flag, the important thing is to keep in mind that a Sparse Index can help you on this Use Case (more details below).
Detailed Explanation:
I would have a sparse index to make sure that you can query a reduced dataset to do the count. In your case you can have an attribute "unread" to flag if the notification was read or not, and use that attribute to create the Sparse Index. When the user reads the notification you simply remove that attribute from the notification so it doesn't show up in the index anymore. Here are some guidelines from the documentation that clearly apply to your scenario:
Take Advantage of Sparse Indexes
For any item in a table, DynamoDB will only write a corresponding
index entry if the index range key
attribute value is present in the item. If the range key attribute
does not appear in every table item, the index is said to be sparse.
[...]
To track open orders, you can create an index on CustomerId (hash) and
IsOpen (range). Only those orders in the table with IsOpen defined
will appear in the index. Your application can then quickly and
efficiently find the orders that are still open by querying the index.
If you had thousands of orders, for example, but only a small number
that are open, the application can query the index and return the
OrderId of each open order. Your application will perform
significantly fewer reads than it would take to scan the entire
CustomerOrders table. [...]
Instead of writing an arbitrary value into the IsOpen attribute, you
can use a different attribute that will result in a useful sort order
in the index. To do this, you can create an OrderOpenDate attribute
and set it to the date on which the order was placed (and still delete
the attribute once the order is fulfilled), and create the OpenOrders
index with the schema CustomerId (hash) and OrderOpenDate (range).
This way when you query your index, the items will be returned in a
more useful sort order.[...]
Such a query can be very efficient, because the number of items in the
index will be significantly fewer than the number of items in the
table. In addition, the fewer table attributes you project into the
index, the fewer read capacity units you will consume from the index.
Source:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForGSI.html#GuidelinesForGSI.SparseIndexes
Find below some references to the operations that you will need to programmatically create and delete tables:
Create Table
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_CreateTable.html
Delete Table
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DeleteTable.html
I'm an active user of DynamoDB and here is what I would do... Firstly, I'm assuming that you need to access notifications individually (e.g. to mark them as read/seen), in addition to getting the latest notifications by user_id.
Table design:
NotificationsTable
id - Hash key
user_id
timestamp
...
UserNotificationsIndex (Global Secondary Index)
user_id - Hash key
timestamp - Range key
id
When you query the UserNotificationsIndex, you set the user_id of the user whose notifications you want and ScanIndexForward to false, and DynamoDB will return the notification ids for that user in reverse chronological order. You can optionally set a limit on how many results you want returned, or get a max of 1 MB.
With regards to projecting attributes, you'll either have to project the attributes you need into the index, or you can simply project the id and then write "hydrate" functionality in your code that does a look up on each ID and returns the specific fields that you need.
If you really don't like that, here is an alternate solution for you... Set your id as your timestamp. For example, I would use the # of milliseconds since a custom epoch (e.g. Jan 1, 2015). Here is an alternate table design:
NotificationsTable
user_id - Hash key
id/timestamp - Range key
Now you can query the NotificationsTable directly, setting the user_id appropriately and setting ScanIndexForward to false on the sort of the Range key. Of course, this assumes that you won't have a collision where a user gets 2 notifications in the same millisecond. This should be unlikely, but I don't know the scale of your system.