I have a dynamodb table. And I want to build a page where I can see the items in the table. But since this could have tens of thousand of items, I want to see them in 10 items per page. How do I do that? How to scan items 1000 to 2000?
import boto
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
res = table.scan(attributes_to_get=['id'], max_results=10)
for i in res:
print i
What do you mean by 1000~2000 items?
There is no global order of hash keys (primary or index), thus it's hard to define the 10000~20000 items in advance.
However, it makes perfect sense if you'd like to find the next 1000 items, given the last return item. To fetch the next page, you execute the scan method again by providing the primary key value of the last item in the previous page so that the scan method can return the next set of items. The parameter name is exclusive_start_key and its initial value is None.
See more details in official docs.
Related
So I'm following the one table design and the PK keys that are with the below format
P#product1
P#product2
P#product3
W#warehouse1
W#warehouse2
P#product4
....
With this query pattern "get all products" , I need to run a scan to get all records "begins_with = P#" and I'm not sure if this is the ideal approach.
I understand Scan is resource-consuming (and I would love not to have to rely on it)
Not to mention that if I want to put in limit & pagination, the scenario becomes even more cumbersome (as limit is applied before the filter). E.g: the first scan with a limit of 10 may return only 3 products, next one may only return 2 , etc..)
Is there a more straight forward approach? I was hoping to at least scan through say 87 products out of 1000 records, and will still be able to get 9 pages of 10 products per instead?
I've come across other forum topics and found this solution that we can utilise Dynamodb Global Secondary Index
Basically:
We'll set up an attribute , say entitytype(values can be product,warehouse...)
And create a Global Secondary Index with
GSI PK : to set to that entitytype
GSI SK : set to the original PK
We'll end up having the below in this GSI
product P#product1
product P#product2
warehouse W#warehouse1
We can then query against this GSI using Query entitytype=product
When I try to save a new item I need to find the item with the highest ID in the database in order to add 1 to it and save the next item in the order in the database. Simply counting the items in the DB will not work as if an item is deleted the count will be incorrect.
I have no code to fix but pseudo looks something like:
look at all the items in the DB
Find the item with the highest ID
Add one to that number
save the new item with the new highest id in the DB
I am using Django. as such it should use the querysets within Django and or python.
Field id of the Django Model is by default auto increment so whenever you save a new object to the database it does exactly what you want - saves object with id greater than the last object's id by one.
Anyways, there are multiple ways you can retrieve latest id from the database.
The most efficient way (simplest and fastest database query since you want only id value returned, not the whole object) is by saying:
latest_id = Model.objects.all().values_list('id', flat=True).order_by('-id').first()
The queryset looks like this:
SELECT 'model'.'id' FROM 'model' ORDER BY 'model'.'id' DESC LIMIT 1;
all() gets all objects of the model from the database, values_list('id', flat=True) extracts only the value of the id field (this saves you time because you don't retrieve all model fields), order_by('-id') orders objects by id in descending order and first() gives you the desired result which is the last id.
There is also method like last() that does the oposite of method first(). It retrieves last whole object of the model from the database or the method latest('id') which does the same.
I want to retrieve all the items from my table without specifying any particular parameter, I can do it using Key Pair, but want to get all items. How to do it?
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Email')
response = table.get_item(
Key={
"id": "2"
}
)
item = response['Item']
print(item)
This way I can do, but how to retrieve all items? is there any method?
If you want to retrieve all items you will need to use the Scan command.
You can do this by running
response = table.scan()
Be aware that running this will utilise a large number of read credits (RCU). If you're using eventual consistency 1 RCU will be equal to 2 items (under 4KB) and strongly consistent will be 1 item per each RCU (under 4KB).
Here is the consideration page for scans vs queries in AWS documentation.
I'm using Django with Postgres.
On a page I can show a list of featured items, let's say 10.
If in the database I have more featured items than 10, I want to get them random/(better rotate).
If the number of featured item is lower than 10, get all featured item and add to the list until 10 non-featured items.
Because the random takes more time on database, I do the sampling in python:
count = Item.objects.filter(is_featured=True).count()
if count >= 10:
item = random.sample(list(Item.objects.filter(is_featured=True))[:10])
else:
item = list(Item.objects.all()[:10])
The code above miss the case where there less than 10 featured(for example 8, to add 2 non-featured).
I can try to add a new query, but I don't know if this is an efficient retrive, using 4-5 queries for this.
The best solution I could find is this:
from itertools import chain
items = list(chain(Item.objects.filter(is_featured=True).order_by('?'), Item.objects.filter(is_featured=False).order_by('?')))[:10]
In this way, the order of the querysets are retained, but downside is that items becomes a list not a Queryset. You can see more details in this SO Answer. FYI: there are some fantastic solutions like using Q or pipe but they don't retain order of queryset.
SQL method: You can achieve that with an SQL statement like this:
SELECT uuid_generate_v4(), *
FROM table_name
ORDER BY NOT is_featured, uuid_generate_v4()
LIMIT 10;
Explain: The generated UUID should simulate randomness (for the purpose of e-commerce, this should suffice). While sorting the rows by NOT is_featured will put the is_featured rows on top; and automatically flow the rows down to 10 limits if it run out of featured items.
How would one go about retrieving the last 1,000 values from a database via a Objects.filter? The one I am currently doing is bringing me the first 1,000 values to be entered into the database (i.e. 10,000 rows and it's bringing me the 1-1000, instead of 9000-1,000).
Current Code:
limit = 1000
Shop.objects.filter(ID = someArray[ID])[:limit]
Cheers
Solution:
queryset = Shop.objects.filter(id=someArray[id])
limit = 1000
count = queryset.count()
endoflist = queryset.order_by('timestamp')[count-limit:]
endoflist is the queryset you want.
Efficiency:
The following is from the django docs about the reverse() queryset method.
To retrieve the ''last'' five items in
a queryset, you could do this:
my_queryset.reverse()[:5]
Note that this is not quite the same
as slicing from the end of a sequence
in Python. The above example will
return the last item first, then the
penultimate item and so on. If we had
a Python sequence and looked at
seq[-5:], we would see the fifth-last
item first. Django doesn't support
that mode of access (slicing from the
end), because it's not possible to do
it efficiently in SQL.
So I'm not sure if my answer is merely inefficient, or extremely inefficient. I moved the order_by to the final query, but I'm not sure if this makes a difference.
reversed(Shop.objects.filter(id=someArray[id]).reverse()[:limit])