How to retrieve all the item from DynamoDB using boto3? - amazon-web-services

I want to retrieve all the items from my table without specifying any particular parameter, I can do it using Key Pair, but want to get all items. How to do it?
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Email')
response = table.get_item(
Key={
"id": "2"
}
)
item = response['Item']
print(item)
This way I can do, but how to retrieve all items? is there any method?

If you want to retrieve all items you will need to use the Scan command.
You can do this by running
response = table.scan()
Be aware that running this will utilise a large number of read credits (RCU). If you're using eventual consistency 1 RCU will be equal to 2 items (under 4KB) and strongly consistent will be 1 item per each RCU (under 4KB).
Here is the consideration page for scans vs queries in AWS documentation.

Related

How to set Time to live in dynamodb item

I am trying to add items in dynamodb in batch. My table consists of composite primary key i.e. a combination of primary key and sort key. I have enabled time to live on my table but metrics for deletedItemsCount is showing no change.
Following is my code :-
def generate_item(data):
item = {
"pk": data['pk'],
"ttl": str(int(time.time())), # current time set for testing
"data": json.dumps({"data": data}),
"sk": data['sk']
}
return item
def put_input_data(input_data, table_name):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)
data_list = input_data["data"]
try:
with table.batch_writer() as writer:
for index, data in enumerate(data_list):
writer.put_item(Item=generate_item(data))
except ClientError as exception_message:
raise
On querying the table I can see item is getting added into the table, but graph for deletedItemsCount shows no change.
Can someone point where am I going wrong ? Would appreciate any hint.
Thanks
looks like your ttl attribute is a String, but...
The TTL attribute’s value must be a Number data type. For example, if you specify for a table to use the attribute name expdate as the TTL attribute, but the attribute on an item is a String data type, the TTL processes ignore the item.
Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html#time-to-live-ttl-before-you-start-formatting
Hope that resolves your issue.
The implementation of time-to-live (TTL) is different in different databases and you shouldn't assume a specific implementation in DynamoDB.
The usual requirement of TTL is that the object will not be visible when reading or writing after the TTL period, and not necessarily be evicted from the table by that time. When you access an item in the table, DynamoDB checks the TTL of the item and returns it or updates it only if it is valid (before its expiration TTL). If it is not valid anymore, DynamoDB will ignore the item, and from your perspective as a client, it will be similar to the experience that the item was already deleted.
UPDATE: Based on the comment below from #Nadav Har'El, it is your responsibility to check the validity of the items using the TTL value (documentation here).
The actual deletion or eviction is done by a sweeper that goes over the table periodically. Please also note that the deletion after TTL is a system-delete compared to a standard delete by a delete command from a client. If you are processing the DynamoDB stream you should be aware of that difference. You can read more about TTL and DynamoDB streams here.

Inserting query parameters into DynamoDB using Boto3

I am trying to have my server less function working as i am trying my hands on it.
I am trying to perform API PUT method , which will be integrated with proxy lambda function
I have a lambda function as below:
def lambda_handler(event, context):
param = event['queryStringParameters']
dynamodb = boto3.resource('dynamodb', region_name="us-east-1")
table = dynamodb.Table('*****')
response = table.put_item(
Item = {
}
)
i want to insert the Param value which i am getting from query parameters into DynamoDB table.
I am able to achieve it by :
response = table.put_item(
Item = param
)
But the issue here is if the partition key is present it will just over ride the value in place of throwing an error of present partition key.
I know the PUT method is idempotent.
Is there any other way i can achieve this ?
Per https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.put_item, you can:
"perform a conditional put operation (add a new item if one with the
specified primary key doesn't exist)"
Note
To prevent a new item from replacing an existing item, use a
conditional expression that contains the attribute_not_exists function
with the name of the attribute being used as the partition key for the
table. Since every record must contain that attribute, the
attribute_not_exists function will only succeed if no matching item
exists.
Also see DynamoDB: updateItem only if it already exists
If you really need to know whether the item exists or not so you can trigger your exception logic, then run a query first to see if the item already exists and don't even call put_item. You can also explore whether using a combination of ConditionExpression and one of the ReturnValues options (for put_item or update_item) may return enough data for you to know if an item existed.

How to get all the Sort keys for a given Partition key (HASH) efficiently?

I am new to DynamoDB and I am coming from an RDBMS background. Is there any way to get all the sortkey (RANGE) for a given Partition key (HASH). I am not interested in the data, just the sort keys. What is the efficient way to achieve this?
I don't know if it's possible to do exactly as you asked but you could add the sort key value as a separate column in the table.
Perhaps it would be simpler to have two separate columns in the table, one for your partition key and one for your range/sort key. Create a secondary index on the partition key to query and then return values from your new column representing your sort key.
I'm assuming that HashKey & RangeKey are specified while creating DynamoDB Table. You can use DynamoDB's Query API and specify range key's column name in AttributesToGet field of this API request. Please use the pagination support provided in Query API, else your system will suffer in case large number of values are returned.
You can improve the #Chris McLaughlin solution adding a ProjectionExpression attribute to the query. ProjectionExpression need to be a string that identifies one ("attribute_name") or more attributes ("attribute_name1,attribute_name2") to retrieve from the table.
response = table_object.query(
KeyConditionExpression = Key(partition_key_name).eq(partition_key_value),
ProjectionExpression = sort_key_name
)
This will give you all the sort_keys in your table. It is not necessary to create an additional column to do this since the sort_key is already a column in the table.
You can use KeyConditionExpression as part of the DynamoDB QueryAPI
Here is roughly how you could do it in python:
import boto3
from boto3.dynamodb.conditions import Key
from botocore.exceptions import ClientError
session = boto3.session.Session(region_name = 'us-east-1')
dynamodb = session.resource('dynamodb')
table_object = dynamodb.Table(table_name)
return_list = []
try:
response = table_object.query(
KeyConditionExpression = Key(partition_key_name).eq(partition_key_value),
ProjectionExpression = sort_key_name
)
except ClientError:
return False
if 'Items' in response:
for response_result in response['Items']:
if sort_key_name in response_result:
return_list.append(response_result.get(sort_key_name))
return return_list
else:
return False
Updated thanks to #Hernan for suggesting including ProjectionExpression

Fulltext Search DynamoDB

Following situation:
I´m storing elements in a DyanmoDb for my customers. HashKey is a Element ID and Range Key is the customer ID. In addition to these fields I´m storing an array of strings -> tags (e.g. ["Pets", "House"]) and a multiline text.
I want to provide a search function in my application, where the user can type a free text or select tags and get all related elements.
In my opinion a plain DB query is not the correct solution. I was playing around with CloudSearch, but I´m not really sure if this is the correct solution, because everytime the user adds a tag the index must be updated...
I hope you have some hints for me.
DynamoDB is now integrated with Elasticsearch, enabling you to perform
full-text queries on your data.
https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-dynamodb-elasticsearch-integration/
DynamoDB streams are used to keep the search index up-to-date.
You can use an instant-search engine like Typesense to search through data in your DynamoDB table:
https://github.com/typesense/typesense
There's also ElasticSearch, but it has a steep learning curve and can become a beast to manage, given the number of features and configuration options it supports.
At a high level:
Turn on DynamoDB streams
Setup an AWS Lambda trigger to listen to these change events
Write code inside your lambda function to index data into Typesense:
def lambda_handler(event, context):
client = typesense.Client({
'nodes': [{
'host': '<Endpoint URL>',
'port': '<Port Number>',
'protocol': 'https',
}],
'api_key': '<API Key>',
'connection_timeout_seconds': 2
})
processed = 0
for record in event['Records']:
ddb_record = record['dynamodb']
if record['eventName'] == 'REMOVE':
res = client.collections['<collection-name>'].documents[str(ddb_record['OldImage']['id']['N'])].delete()
else:
document = ddb_record['NewImage'] # format your document here and the use upsert function to index it.
res = client.collections['<collection-name>'].upsert(document)
print(res)
processed = processed + 1
print('Successfully processed {} records'.format(processed))
return processed
Here's a detailed article from Typesense's docs on how to do this: https://typesense.org/docs/0.19.0/guide/dynamodb-full-text-search.html
DynamoDB just added PartiQL, a SQL-compatible language for querying data. You can use the contains() function to find a value within a set (or a substring): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-functions.contains.html
In your specific case you need Elastic search. But you can do wildcard text search on sort-key,
/* Return all of the songs by an artist, matching first part of title */
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE 'Call%';
/* Return all of the songs by an artist, with a particular word in the title...
...but only if the price is less than 1.00 */
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE '%Today%'
AND Price < 1.00;
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.ReadData.Query.html
This is the advantage of using dynamodb as a 'managed service' by aws. You get multiple components managed apart from the managed nosql db.
If you are using the 'downloaded' version of dynamodb then you need to ' build your own ' elasticcluster and index the data in dynamodb .

Boto scanning a dynamodb table from item 1000 to 2000

I have a dynamodb table. And I want to build a page where I can see the items in the table. But since this could have tens of thousand of items, I want to see them in 10 items per page. How do I do that? How to scan items 1000 to 2000?
import boto
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
res = table.scan(attributes_to_get=['id'], max_results=10)
for i in res:
print i
What do you mean by 1000~2000 items?
There is no global order of hash keys (primary or index), thus it's hard to define the 10000~20000 items in advance.
However, it makes perfect sense if you'd like to find the next 1000 items, given the last return item. To fetch the next page, you execute the scan method again by providing the primary key value of the last item in the previous page so that the scan method can return the next set of items. The parameter name is exclusive_start_key and its initial value is None.
See more details in official docs.