DynamoDB Primary Key strategy - amazon-web-services

I'm dabbling with DynamoDB (using boto3) for the first time, and I'm not sure how to define my Partition Key. I'm used to SQL, where you can use AUTO_INCREMENT to ensure that the Key will always increase.
I haven't seen such an option in DynamoDB - instead, when using put_item, the "primary key attributes are required" - I take this to mean that I have to define the value explicitly (and, indeed, if I leave it off, I get botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the PutItem operation: One or more parameter values were invalid: Missing the key id in the item)
If already have rows with id 1, 2, 3, ...N, I naturally want the next row that I insert to have Primary Key N+1. But I don't know how to generate that - the solutions given here are all imperfect.
Should I be generating the Primary Key values independently, perhaps by hashing the other values of the item? If I do so, isn't there a (small) chance of hash-collision? Then again, since DynamoDB seems to determine partition based on a hash of the Partition Key, is there any reason for me not to simply use a random sufficiently-long string?

DynamoDb does not support generated keys, you have to specify one yourself. You can't reliably generate sequential IDs.
One common way is instead to use UUIDs.

I had the same problem while working through the Build a basic Web Application tutorial.
In module 4 of the tutorial, after modifying the lambda function to write to the DynamoDB table, I had to change ID to Id in the line marked THIS LINE (see below) after which the test worked.
def lambda_handler(event, context):
# extract values from the event object we got from the Lambda service and store in a variable
name = event['firstName'] +' '+ event['lastName']
# write name and time to the DynamoDB table using the object we instantiated and save response in a variable
response = table.put_item(
Item={
'ID': name, <- THIS LINE
'LatestGreetingTime':now
})
# return a properly formatted JSON object
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda, ' + name)
}
I also had to edit my test input to include a random uuid as shown:
{
"Id": "560e2227-c738-41d9-ad5a-bcad6a3bc273",
"firstName": "Ada",
"lastName": "Lovelace"
}

Related

How to set Time to live in dynamodb item

I am trying to add items in dynamodb in batch. My table consists of composite primary key i.e. a combination of primary key and sort key. I have enabled time to live on my table but metrics for deletedItemsCount is showing no change.
Following is my code :-
def generate_item(data):
item = {
"pk": data['pk'],
"ttl": str(int(time.time())), # current time set for testing
"data": json.dumps({"data": data}),
"sk": data['sk']
}
return item
def put_input_data(input_data, table_name):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)
data_list = input_data["data"]
try:
with table.batch_writer() as writer:
for index, data in enumerate(data_list):
writer.put_item(Item=generate_item(data))
except ClientError as exception_message:
raise
On querying the table I can see item is getting added into the table, but graph for deletedItemsCount shows no change.
Can someone point where am I going wrong ? Would appreciate any hint.
Thanks
looks like your ttl attribute is a String, but...
The TTL attribute’s value must be a Number data type. For example, if you specify for a table to use the attribute name expdate as the TTL attribute, but the attribute on an item is a String data type, the TTL processes ignore the item.
Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html#time-to-live-ttl-before-you-start-formatting
Hope that resolves your issue.
The implementation of time-to-live (TTL) is different in different databases and you shouldn't assume a specific implementation in DynamoDB.
The usual requirement of TTL is that the object will not be visible when reading or writing after the TTL period, and not necessarily be evicted from the table by that time. When you access an item in the table, DynamoDB checks the TTL of the item and returns it or updates it only if it is valid (before its expiration TTL). If it is not valid anymore, DynamoDB will ignore the item, and from your perspective as a client, it will be similar to the experience that the item was already deleted.
UPDATE: Based on the comment below from #Nadav Har'El, it is your responsibility to check the validity of the items using the TTL value (documentation here).
The actual deletion or eviction is done by a sweeper that goes over the table periodically. Please also note that the deletion after TTL is a system-delete compared to a standard delete by a delete command from a client. If you are processing the DynamoDB stream you should be aware of that difference. You can read more about TTL and DynamoDB streams here.

Is there a way how to address nested properties in AWS DynamoDB for purpose of documentClient.query() call?

I am currently testing how to design a query from AWS.DynamoDB.DocumentClient query() call that takes params: DocumentClient.QueryInput, which is used for retrieving data collection from a table in DynamoDB.
Query seems to be simple and working fine while working with indexes of type String or Number only. What I am not able to make is an query, that will use a valid index and filter upon an attribute that is nested (see my data structure please).
I am using FilterExpression, where can be defined logic for filtering - and that seems to be working fine in all cases except cases when trying to do filtering on nested attribute.
Current parameters, I am feeding query with
parameters {
TableName: 'myTable',
ProjectionExpression: 'HashKey, RangeKey, Artist ,#SpecialStatus, Message, Track, Statistics'
ExpressionAttributeNames: { '#SpecialStatus': 'Status' },
IndexName: 'Artist-index',
KeyConditionExpression: 'Artist = :ArtistName',
ExpressionAttributeValues: {
':ArtistName': 'BlindGuadian',
':Track': 'Mirror Mirror'
},
FilterExpression: 'Track = :Track'
}
Data structure in DynamoDB's table:
{
'Artist' : 'Blind Guardian',
..
'Track': 'Mirror Mirror',
'Statistics' : [
{
'Sales': 42,
'WrittenBy' : 'Kursch'
}
]
}
Lets assume we want to filter out all entries from DB, by using Artist in KeyConditionExpression. We can achieve this by feeding Artist with :ArtistName. Now the question, how to retrieve records that I can filter upon WritenBy, which is nested in Statistics?
To best of my knowledge, we are not able to use any other type but String, Number or Binary for purpose of making secondary indexes. I've been experimenting with Secondary Indexes and Sorting Keys as well but without luck.
I've tried documentClient.scan(), same story. Still no luck with accessing nested attributes in List (FilterExpression just won't accept it).
I am aware of possibility to filter result on "application" side, once the records are retrieved (by Artists for instance) but I am interested to filter it out in FilterExpression
If I understand your problem correctly, you'd like to create a query that filters on the value of a complex attribute (in this case, a list of objects).
You can filter on the contents of a list by indexing into the list:
var params = {
TableName: "myTable",
FilterExpression: "Statistics[0].WrittenBy = :writtenBy",
ExpressionAttributeValues: {
":writtenBy": 'Kursch'
}
};
Of course, if you don't know the specific index, this wont really help you.
Alternatively, you could use the CONTAINS function to test if the object exists in your list. The CONTAINS function will require all the attributes in the object to match the condition. In this case, you'd need to provide Sales and WrittenBy, which probably doesn't solve your problem here.
The shape of your data is making your access pattern difficult to implement, but that is often the case with DDB. You are asking DDB to support a query of a list of objects, where the object has a specific attribute with a specific value. As you've seen, this is quote tricky to do. As you know, getting the data model to correctly support your access patterns is critical to your success with DDB. It can also be difficult to get right!
A couple of ideas that would make your access pattern easier to implement:
Move WrittenBy out of the complex attribute and put it alongside the other top-level attributes. This would allow you to use a simple FilterExpression on the WrittenBy attribute.
If the WrittenBy attribute must stay within the Statistics list, make it stand alone (e.g. [{writtenBy: Kursch}, {Sales: 42},...]). This way, you'd be able to use the CONTAINS keyword in your search.
Create a secondary index with the WrittenBy field in either the PK or SK (whichever makes sense for your data model and access patterns).

Inserting query parameters into DynamoDB using Boto3

I am trying to have my server less function working as i am trying my hands on it.
I am trying to perform API PUT method , which will be integrated with proxy lambda function
I have a lambda function as below:
def lambda_handler(event, context):
param = event['queryStringParameters']
dynamodb = boto3.resource('dynamodb', region_name="us-east-1")
table = dynamodb.Table('*****')
response = table.put_item(
Item = {
}
)
i want to insert the Param value which i am getting from query parameters into DynamoDB table.
I am able to achieve it by :
response = table.put_item(
Item = param
)
But the issue here is if the partition key is present it will just over ride the value in place of throwing an error of present partition key.
I know the PUT method is idempotent.
Is there any other way i can achieve this ?
Per https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.put_item, you can:
"perform a conditional put operation (add a new item if one with the
specified primary key doesn't exist)"
Note
To prevent a new item from replacing an existing item, use a
conditional expression that contains the attribute_not_exists function
with the name of the attribute being used as the partition key for the
table. Since every record must contain that attribute, the
attribute_not_exists function will only succeed if no matching item
exists.
Also see DynamoDB: updateItem only if it already exists
If you really need to know whether the item exists or not so you can trigger your exception logic, then run a query first to see if the item already exists and don't even call put_item. You can also explore whether using a combination of ConditionExpression and one of the ReturnValues options (for put_item or update_item) may return enough data for you to know if an item existed.

Querying nested attributes in Amazon DynamoDB

How can I efficiently query on nested attributes in Amazon DynamoDB?
I have a document structure as below, which lets me store related information in the document itself (rather than referencing it).
It makes sense to store the seminars nested in the course, since they will likely be queried alongside the course (they are all course-specific, i.e. a course has many seminars, and a seminar belongs to a course).
In CouchDB, which I’m migrating from, I could write a View that would project some nested attributes for querying. I understand that I can’t project anything that isn’t a top-level attribute into a dynamodb secondary index, so this approach doesn’t seem to work.
This brings me back to the question: how can I efficiently query on nested attributes without scanning, if I can’t use them as keys in an index?
For example, if I want to get average attendance at Nelson Mandela Theatre, how can I query for the values of registrations and attendees in all seminars that have a location of “Nelson Mandela Theatre” without resorting to a scan?
{
“course_id”: “ABC-1234567”,
“course_name”: “Statistics 101”,
“tutors”: [“Cognito-sub-1”, “Cognito-sub-2”],
“seminars”: [
{
“seminar_id”: “XXXYYY-12345”,
“epoch_time”: “123456789”,
“duration”: “5400”,
“location”: “Nelson Mandela Theatre”,
“name”: “How to lie with statistics”,
“registrations”: “92”,
“attendees”: “61”
},
{
“seminar_id”: “BBBCCC-44444”,
“epoch_time”: “155555555”,
“duration”: “5400”,
“location”: “Nelson Mandela Theatre”,
“name”: “Statistical significance for dog owners”,
“registrations”: “244”,
“attendees”: “240”
},
{
“seminar_id”: “XXXAAA-54321”,
“epoch_time”: “223456789”,
“duration”: “4000”,
“location”: “Starbucks”,
“name”: “Is feral cat population growth a leading indicator for the S&P 500?”,
“registrations”: “40”
}
]
}
{
“course_id”: “CJX-5553389”,
“course_name”: “Cat Health 101”,
“tutors”: [“Cognito-sub-4”, “Cognito-sub-9”],
“seminars”: [
{
“seminar_id”: “TTRHJK-43278”,
“epoch_time”: “123456789”,
“duration”: “5400”,
“location”: “Catwoman Hall”,
“name”: “Emotional support octopi for cats”,
“registrations”: “88”,
“attendees”: “87”
},
{
“seminar_id”: “BBBCCC-44444”,
“epoch_time”: “123666789”,
“duration”: “5400”,
“location”: “Nelson Mandela Theatre”,
“name”: “Statistical significance for cat owners”,
“registrations”: “44”,
“attendees”: “44”
}
]
}
Index cannot be created for nested attributes (i.e. document data types in Dynamodb).
Document Types – A document type can represent a complex structure
with nested attributes—such as you would find in a JSON document. The
document types are list and map.
Query Api:-
A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.
Scan API:-
A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.
In order to use Query API, the hash key value is required. The OP doesn't have any information that hash key value is available. As per OP, the data needs to be queried by location attribute which is inside the Dynamodb List data type. Now, the option is to look at GSI.
Kindly read more about the GSI. One of the rules is that GSI can be created using top level attributes only. So, the location can't be used to create the index.
So, creating the GSI in order to use Query API has been ruled out as well.
The index key attributes can consist of any top-level String, Number,
or Binary attributes from the base table; other scalar types, document
types, and set types are not allowed.
Because of the above mentioned reasons, the Query API can't be used to get the data based on location attribute assuming hash key value is not available.
If hash key value is available, FilterExpression can be used to filter the data. Only way to filter the data present in the complex list data type is CONTAINS function. In order to use CONTAINS function, all the attributes in the occurrence is required to match the data (i.e. seminar_id, location, duration and all other attributes). So, it is definitely not possible to fulfil the use case mentioned in the OP using the current data model.
Proposed alternate solution:-
Re-modeling the data structure as mentioned below could be an option to resolve the problem. There is definitely no other solution available to fulfil the use case using Query API.
Main Table :-
Course Id - Hash Key
seminar_id - Sort Key
GSI :-
Seminar location - Hash Key
Course Id - Sort Key
In a DynamoDB table, each key value must be unique. However, the key
values in a global secondary index do not need to be unique.
Now, you can use the Query API on GSI to get the data for Seminar location is equal to Nelson Mandela Theatre. You can use the course id in the query api if you know the value. The query api will potentially give multiple items in the result set. You can use FilterExpression if you would like to further filter the data based on some non key attributes.
This is an example from here where you use a filter expression, it is with a scan operation, but maybe you can apply something similar for query instead of scan (take a look at the API):
{
"TableName": "MyTable",
"FilterExpression": "#k_Compatible.#k_RAM = :v_Compatible_RAM",
"ExpressionAttributeNames": {
"#k_Compatible": "Compatible",
"#k_RAM": "RAM"
},
"ExpressionAttributeValues": {
":v_Compatible_RAM": "RAM1"
}
}
You can do one thing to make it working on Scan
Store the object in stringify format like
{
"language": "[{\"language\":\"Male\",\"proficiency\":\"Female\"}]"
}``
and then can perform scan operation
language: {
contains: "Male"
}
on client side you can perform JSON.parse(language)
I have not such experience with DynamoDB yet but started setudying it since I'm planning on use it for my next project.
As far as I could understand from AWS documentation, the answer to your question is: it's not possible to efficiently query on nested attributes.
Looking at Best Practices, spetially Best Practices for Using Secondary Indexes in DynamoDB, it's possible to understand that the right approach should be using diffent line types under the same Partition Key as shown here. Then under the same course_id you would have a generic sorting key(sk). The first register would then have sk = 'Details' with course's data, then other registers like "seminar-1" and it's data, and so on.
You would then set seminar's properties you would like to query as SGI (Secondary Global Index) bearing in mind that it can only have 5 SGI per table.
Hope it helps.
You can use document paths to filter the values. Use seminars.location as the document path.

Fulltext Search DynamoDB

Following situation:
I´m storing elements in a DyanmoDb for my customers. HashKey is a Element ID and Range Key is the customer ID. In addition to these fields I´m storing an array of strings -> tags (e.g. ["Pets", "House"]) and a multiline text.
I want to provide a search function in my application, where the user can type a free text or select tags and get all related elements.
In my opinion a plain DB query is not the correct solution. I was playing around with CloudSearch, but I´m not really sure if this is the correct solution, because everytime the user adds a tag the index must be updated...
I hope you have some hints for me.
DynamoDB is now integrated with Elasticsearch, enabling you to perform
full-text queries on your data.
https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-dynamodb-elasticsearch-integration/
DynamoDB streams are used to keep the search index up-to-date.
You can use an instant-search engine like Typesense to search through data in your DynamoDB table:
https://github.com/typesense/typesense
There's also ElasticSearch, but it has a steep learning curve and can become a beast to manage, given the number of features and configuration options it supports.
At a high level:
Turn on DynamoDB streams
Setup an AWS Lambda trigger to listen to these change events
Write code inside your lambda function to index data into Typesense:
def lambda_handler(event, context):
client = typesense.Client({
'nodes': [{
'host': '<Endpoint URL>',
'port': '<Port Number>',
'protocol': 'https',
}],
'api_key': '<API Key>',
'connection_timeout_seconds': 2
})
processed = 0
for record in event['Records']:
ddb_record = record['dynamodb']
if record['eventName'] == 'REMOVE':
res = client.collections['<collection-name>'].documents[str(ddb_record['OldImage']['id']['N'])].delete()
else:
document = ddb_record['NewImage'] # format your document here and the use upsert function to index it.
res = client.collections['<collection-name>'].upsert(document)
print(res)
processed = processed + 1
print('Successfully processed {} records'.format(processed))
return processed
Here's a detailed article from Typesense's docs on how to do this: https://typesense.org/docs/0.19.0/guide/dynamodb-full-text-search.html
DynamoDB just added PartiQL, a SQL-compatible language for querying data. You can use the contains() function to find a value within a set (or a substring): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-functions.contains.html
In your specific case you need Elastic search. But you can do wildcard text search on sort-key,
/* Return all of the songs by an artist, matching first part of title */
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE 'Call%';
/* Return all of the songs by an artist, with a particular word in the title...
...but only if the price is less than 1.00 */
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE '%Today%'
AND Price < 1.00;
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.ReadData.Query.html
This is the advantage of using dynamodb as a 'managed service' by aws. You get multiple components managed apart from the managed nosql db.
If you are using the 'downloaded' version of dynamodb then you need to ' build your own ' elasticcluster and index the data in dynamodb .