I am trying to have my server less function working as i am trying my hands on it.
I am trying to perform API PUT method , which will be integrated with proxy lambda function
I have a lambda function as below:
def lambda_handler(event, context):
param = event['queryStringParameters']
dynamodb = boto3.resource('dynamodb', region_name="us-east-1")
table = dynamodb.Table('*****')
response = table.put_item(
Item = {
}
)
i want to insert the Param value which i am getting from query parameters into DynamoDB table.
I am able to achieve it by :
response = table.put_item(
Item = param
)
But the issue here is if the partition key is present it will just over ride the value in place of throwing an error of present partition key.
I know the PUT method is idempotent.
Is there any other way i can achieve this ?
Per https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.put_item, you can:
"perform a conditional put operation (add a new item if one with the
specified primary key doesn't exist)"
Note
To prevent a new item from replacing an existing item, use a
conditional expression that contains the attribute_not_exists function
with the name of the attribute being used as the partition key for the
table. Since every record must contain that attribute, the
attribute_not_exists function will only succeed if no matching item
exists.
Also see DynamoDB: updateItem only if it already exists
If you really need to know whether the item exists or not so you can trigger your exception logic, then run a query first to see if the item already exists and don't even call put_item. You can also explore whether using a combination of ConditionExpression and one of the ReturnValues options (for put_item or update_item) may return enough data for you to know if an item existed.
I want to retrieve all the items from my table without specifying any particular parameter, I can do it using Key Pair, but want to get all items. How to do it?
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Email')
response = table.get_item(
Key={
"id": "2"
}
)
item = response['Item']
print(item)
This way I can do, but how to retrieve all items? is there any method?
If you want to retrieve all items you will need to use the Scan command.
You can do this by running
response = table.scan()
Be aware that running this will utilise a large number of read credits (RCU). If you're using eventual consistency 1 RCU will be equal to 2 items (under 4KB) and strongly consistent will be 1 item per each RCU (under 4KB).
Here is the consideration page for scans vs queries in AWS documentation.
How can I efficiently query on nested attributes in Amazon DynamoDB?
I have a document structure as below, which lets me store related information in the document itself (rather than referencing it).
It makes sense to store the seminars nested in the course, since they will likely be queried alongside the course (they are all course-specific, i.e. a course has many seminars, and a seminar belongs to a course).
In CouchDB, which I’m migrating from, I could write a View that would project some nested attributes for querying. I understand that I can’t project anything that isn’t a top-level attribute into a dynamodb secondary index, so this approach doesn’t seem to work.
This brings me back to the question: how can I efficiently query on nested attributes without scanning, if I can’t use them as keys in an index?
For example, if I want to get average attendance at Nelson Mandela Theatre, how can I query for the values of registrations and attendees in all seminars that have a location of “Nelson Mandela Theatre” without resorting to a scan?
{
“course_id”: “ABC-1234567”,
“course_name”: “Statistics 101”,
“tutors”: [“Cognito-sub-1”, “Cognito-sub-2”],
“seminars”: [
{
“seminar_id”: “XXXYYY-12345”,
“epoch_time”: “123456789”,
“duration”: “5400”,
“location”: “Nelson Mandela Theatre”,
“name”: “How to lie with statistics”,
“registrations”: “92”,
“attendees”: “61”
},
{
“seminar_id”: “BBBCCC-44444”,
“epoch_time”: “155555555”,
“duration”: “5400”,
“location”: “Nelson Mandela Theatre”,
“name”: “Statistical significance for dog owners”,
“registrations”: “244”,
“attendees”: “240”
},
{
“seminar_id”: “XXXAAA-54321”,
“epoch_time”: “223456789”,
“duration”: “4000”,
“location”: “Starbucks”,
“name”: “Is feral cat population growth a leading indicator for the S&P 500?”,
“registrations”: “40”
}
]
}
{
“course_id”: “CJX-5553389”,
“course_name”: “Cat Health 101”,
“tutors”: [“Cognito-sub-4”, “Cognito-sub-9”],
“seminars”: [
{
“seminar_id”: “TTRHJK-43278”,
“epoch_time”: “123456789”,
“duration”: “5400”,
“location”: “Catwoman Hall”,
“name”: “Emotional support octopi for cats”,
“registrations”: “88”,
“attendees”: “87”
},
{
“seminar_id”: “BBBCCC-44444”,
“epoch_time”: “123666789”,
“duration”: “5400”,
“location”: “Nelson Mandela Theatre”,
“name”: “Statistical significance for cat owners”,
“registrations”: “44”,
“attendees”: “44”
}
]
}
Index cannot be created for nested attributes (i.e. document data types in Dynamodb).
Document Types – A document type can represent a complex structure
with nested attributes—such as you would find in a JSON document. The
document types are list and map.
Query Api:-
A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.
Scan API:-
A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.
In order to use Query API, the hash key value is required. The OP doesn't have any information that hash key value is available. As per OP, the data needs to be queried by location attribute which is inside the Dynamodb List data type. Now, the option is to look at GSI.
Kindly read more about the GSI. One of the rules is that GSI can be created using top level attributes only. So, the location can't be used to create the index.
So, creating the GSI in order to use Query API has been ruled out as well.
The index key attributes can consist of any top-level String, Number,
or Binary attributes from the base table; other scalar types, document
types, and set types are not allowed.
Because of the above mentioned reasons, the Query API can't be used to get the data based on location attribute assuming hash key value is not available.
If hash key value is available, FilterExpression can be used to filter the data. Only way to filter the data present in the complex list data type is CONTAINS function. In order to use CONTAINS function, all the attributes in the occurrence is required to match the data (i.e. seminar_id, location, duration and all other attributes). So, it is definitely not possible to fulfil the use case mentioned in the OP using the current data model.
Proposed alternate solution:-
Re-modeling the data structure as mentioned below could be an option to resolve the problem. There is definitely no other solution available to fulfil the use case using Query API.
Main Table :-
Course Id - Hash Key
seminar_id - Sort Key
GSI :-
Seminar location - Hash Key
Course Id - Sort Key
In a DynamoDB table, each key value must be unique. However, the key
values in a global secondary index do not need to be unique.
Now, you can use the Query API on GSI to get the data for Seminar location is equal to Nelson Mandela Theatre. You can use the course id in the query api if you know the value. The query api will potentially give multiple items in the result set. You can use FilterExpression if you would like to further filter the data based on some non key attributes.
This is an example from here where you use a filter expression, it is with a scan operation, but maybe you can apply something similar for query instead of scan (take a look at the API):
{
"TableName": "MyTable",
"FilterExpression": "#k_Compatible.#k_RAM = :v_Compatible_RAM",
"ExpressionAttributeNames": {
"#k_Compatible": "Compatible",
"#k_RAM": "RAM"
},
"ExpressionAttributeValues": {
":v_Compatible_RAM": "RAM1"
}
}
You can do one thing to make it working on Scan
Store the object in stringify format like
{
"language": "[{\"language\":\"Male\",\"proficiency\":\"Female\"}]"
}``
and then can perform scan operation
language: {
contains: "Male"
}
on client side you can perform JSON.parse(language)
I have not such experience with DynamoDB yet but started setudying it since I'm planning on use it for my next project.
As far as I could understand from AWS documentation, the answer to your question is: it's not possible to efficiently query on nested attributes.
Looking at Best Practices, spetially Best Practices for Using Secondary Indexes in DynamoDB, it's possible to understand that the right approach should be using diffent line types under the same Partition Key as shown here. Then under the same course_id you would have a generic sorting key(sk). The first register would then have sk = 'Details' with course's data, then other registers like "seminar-1" and it's data, and so on.
You would then set seminar's properties you would like to query as SGI (Secondary Global Index) bearing in mind that it can only have 5 SGI per table.
Hope it helps.
You can use document paths to filter the values. Use seminars.location as the document path.
I'm dabbling with DynamoDB (using boto3) for the first time, and I'm not sure how to define my Partition Key. I'm used to SQL, where you can use AUTO_INCREMENT to ensure that the Key will always increase.
I haven't seen such an option in DynamoDB - instead, when using put_item, the "primary key attributes are required" - I take this to mean that I have to define the value explicitly (and, indeed, if I leave it off, I get botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the PutItem operation: One or more parameter values were invalid: Missing the key id in the item)
If already have rows with id 1, 2, 3, ...N, I naturally want the next row that I insert to have Primary Key N+1. But I don't know how to generate that - the solutions given here are all imperfect.
Should I be generating the Primary Key values independently, perhaps by hashing the other values of the item? If I do so, isn't there a (small) chance of hash-collision? Then again, since DynamoDB seems to determine partition based on a hash of the Partition Key, is there any reason for me not to simply use a random sufficiently-long string?
DynamoDb does not support generated keys, you have to specify one yourself. You can't reliably generate sequential IDs.
One common way is instead to use UUIDs.
I had the same problem while working through the Build a basic Web Application tutorial.
In module 4 of the tutorial, after modifying the lambda function to write to the DynamoDB table, I had to change ID to Id in the line marked THIS LINE (see below) after which the test worked.
def lambda_handler(event, context):
# extract values from the event object we got from the Lambda service and store in a variable
name = event['firstName'] +' '+ event['lastName']
# write name and time to the DynamoDB table using the object we instantiated and save response in a variable
response = table.put_item(
Item={
'ID': name, <- THIS LINE
'LatestGreetingTime':now
})
# return a properly formatted JSON object
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda, ' + name)
}
I also had to edit my test input to include a random uuid as shown:
{
"Id": "560e2227-c738-41d9-ad5a-bcad6a3bc273",
"firstName": "Ada",
"lastName": "Lovelace"
}
I have a dynamodb table. And I want to build a page where I can see the items in the table. But since this could have tens of thousand of items, I want to see them in 10 items per page. How do I do that? How to scan items 1000 to 2000?
import boto
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
res = table.scan(attributes_to_get=['id'], max_results=10)
for i in res:
print i
What do you mean by 1000~2000 items?
There is no global order of hash keys (primary or index), thus it's hard to define the 10000~20000 items in advance.
However, it makes perfect sense if you'd like to find the next 1000 items, given the last return item. To fetch the next page, you execute the scan method again by providing the primary key value of the last item in the previous page so that the scan method can return the next set of items. The parameter name is exclusive_start_key and its initial value is None.
See more details in official docs.