Do we need cache for an array? - django

Since we're developing a web-based project using django. we cache the db operation to make a better performance. But I'm wondering whether we need cache the array.
the code sample like this:
ABigArray = {
"1" : {
"name" : "xx",
"gender" "xxx",
...
},
"2" : {
...
},
...
}
class Items:
def __init__(self):
self.data = ABigArray
def get_item_by_id(self, id):
item = cache.get("item" + str(id)) # get the cached item if possible
if item:
return item
else:
item = self.data.get(str(id))
cache.set("item" + str(id), item)
return item
So I'm wondering whether we really need such cache, since IMO the array( ABigArray ) will be loaded in memory when trying to get one item. So we don't need use cache in such condition, right? Or I'm wrong?
Please correct me if I'm wrong.
Thanks.

You've cut out a bit too much information, but it looks like the "array" (actually a dictionary) is always the same - there's a single instance that is created when the module is first imported, and will be used by every Items object. So there's absolutely nothing to be gained by caching it - in fact you will lose by doing so, as you will introduce an unnecessary round trip to get the data from the cache.

Related

Why does the following dynamoDB write with conditional expression succeeds?

I have the following code to create a dynamoDB table :
def create_mock_dynamo_table():
conn = boto3.client(
"dynamodb",
region_name=REGION,
aws_access_key_id="ak",
aws_secret_access_key="sk",
)
conn.create_table(
TableName=DYNAMO_DB_TABLE,
KeySchema=[
{'AttributeName': 'PK', 'KeyType': 'HASH'},
{'AttributeName': 'SK', 'KeyType': 'RANGE'}
],
AttributeDefinitions=[
{'AttributeName': 'PK', 'AttributeType': 'S'},
{'AttributeName': 'SK', 'AttributeType': 'S'}],
ProvisionedThroughput={"ReadCapacityUnits": 5, "WriteCapacityUnits": 5},
)
mock_table = boto3.resource('dynamodb', region_name=REGION).Table(DYNAMO_DB_TABLE)
return mock_table
Then I use it to create two put-items :
mock_table = create_mock_dynamo_table()
mock_table.put_item(
Item={
'PK': 'did:100000001',
'SK': 'weekday:monday:start_time:00:30',
}
)
mock_table.put_item(
Item={
'PK': 'did:100000001',
'SK': 'weekday:monday:start_time:00:40',
},
ConditionExpression='attribute_not_exists(PK)'
)
When I do the second put_item, the PK is already there in the system and only the sort key is different. But the condition I am setting only in the existence of same PK. So the second put_item should fail right ?
The condition check for PutItem does not check the condition against arbitrary items. It only checks the condition against an item with the same primary key (hash and sort keys), if such an item exists.
In your case, the value of the sort key is different, so when you put the second item, DynamoDB sees that no item exists with that key, therefore the PK attribute does not exist.
This is also why the condition check fails the second time you run the code—because at that point you do already have an item with the same hash and sort keys.
DynamoDB's "IOPS" is very low and the actual write takes some time. You can read more about it here. But, if you run the code a second time soon after, you'll see that you'll get the expected botocore.errorfactory.ConditionalCheckFailedException.
If I may refer to what I think you're trying to do - mock a DB + data. When you want to mock such an "expensive" resource, make an actual fake class. You'll want to wrap all your DB accesses in the actual code with some kind of dal.py module that consolidates operations such as write/read/etc. Then, you mock those methods/functions.
You don't want to write code so tightly coupled with the chosen DB.
The best practice is using an ORM framework such as SQLAlchemy. It is invaluable to take the time now to learn it. But, you might have time constraints I'm not aware of.

Graphene-Django - how to pass an argument from Query to class DjangoObjectType

First of all, thanks ! it has been 1 year without asking question as I always found an answer. You're a tremendous help.
Today, I do have a question I cannot sort out myself.
Please, I hope you would be kind enough to help me on the matter.
Context: I work on a project with Django framework, and I have some dynamic pages made with react.js. The API I'm using in between is graphQL based. Apollo for the client, graphene-django for the back end.
I want to do a dynamic pages made from a GraphQL query having a set (a declared field in the class DjangoObjectType made from a Django query), and I want to be able to filter dynamically the parent with a argument A, and the set with argument B. My problem is how to find a way to pass the argument B to the set to filter it.
The graphQL I would achieved based on graphQL documentation
query DistributionHisto
(
$id:ID,
$limit:Int
)
{
distributionHisto(id:$id)
{
id,
historical(limit:$limit)
{
id,
date,
histo
}
}
}
But I don't understand how to pass (limit:$limit) to my set in the back end.
Here my schema.py
import graphene
from graphene_django.types import DjangoObjectType
class DistributionType(DjangoObjectType):
class Meta:
model = DistributionTuple
historical = graphene.List(HistoricalTimeSeriesType)
def resolve_historical(self, info):
return HistoricalTimeSeries.objects.filter(
distribution_tuple_id=self.id
).order_by('date')[:2]
class Query(object):
distribution_histo = graphene.List(
graphene.NonNull(DistributionType),
id=graphene.ID(),
limit=graphene.Int()
)
def resolve_distribution_histo(
self, info, id=None, limit=None):
filter_q1 = {'id': id} if id else {}
return DistributionTuple.objects.filter(**filter_q1)
I have tried few things, but I didn't find a way to make it to work so far.
At the moment, as you see, the arg "limit" reaches a dead end in def resolve*, where ideally, it would be pass up to the class DistributionSetHistoType where it would replace the slice [:2] by [:limit] in resolve_distribution_slice_set()
I hope I have been clear, please let me know if it's not the case.
Thanks for your support.
This topic called pagination.
front end seletion
const { loading, error, data, fetchMore } = useQuery(GET_ITEMS, {
variables: {
offset: 0,
limit: 10
},
});
backend selction
the number 10 in .count(10) represent the first 10 elements in the array
DistributionTuple.objects.filter(**filter_q1).count(10)

Optimizing database queries in Django

I have a bit of code that is causing my page to load pretty slow (49 queries in 128 ms). This is the landing page for my site -- so it needs to load snappily.
The following is my views.py that creates a feed of latest updates on the site and is causing the slowest load times from what I can see in the Debug toolbar:
def product_feed(request):
""" Return all site activity from friends, etc. """
latestparts = Part.objects.all().prefetch_related('uniparts').order_by('-added')
latestdesigns = Design.objects.all().order_by('-added')
latest = list(latestparts) + list(latestdesigns)
latestupdates = sorted (latest, key = lambda x: x.added, reverse = True)
latestupdates = latestupdates [0:8]
# only get the unique avatars that we need to put on the page so we're not pinging for images for each update
uniqueusers = User.objects.filter(id__in = Part.objects.values_list('adder', flat=True))
return render_to_response("homepage.html", {
"uniqueusers": uniqueusers,
"latestupdates": latestupdates
}, context_instance=RequestContext(request))
The query that causes the most time seem to be:
latest = list(latestparts) + list(latestdesigns) (25ms)
There is another one that's 17ms (sitewide annoucements) and 25ms (adding tagged items on each product feed item) respectively that I am also investigating.
Does anyone see any ways in which I can optimize the loading of my activity feed?
You never need more than 8 items, so limit your queries. And don't forget to make sure that added in both models is indexed.
latestparts = Part.objects.all().prefetch_related('uniparts').order_by('-added')[:8]
latestdesigns = Design.objects.all().order_by('-added')[:8]
For bonus marks, eliminate the magic number.
After making those queries a bit faster, you might want to check out memcache to store the most common query results.
Moreover, I believe adder is ForeignKey to User model.
Part.objects.distinct().values_list('adder', flat=True)
Above line is QuerySet with unique addre values. I believe you ment exactly that.
It saves you performing a subuery.

Hierarchical cache in Django

What I want to do is to mark some values in the cache as related so I could delete them at once. For example when I insert a new entry to the database I want to delete everything in the cache which was based on the old values in database.
I could always use cache.clear() but it seems too brutal to me. Or I could store related values together in the dictionary and cache this dictionary. Or I could maintain some kind of index in an extra field in cache. But everything seems to complicated to me (eventually slow?).
What you think? Is there any existing solution? Or is my approach wrong? Thanks for answers.
Are you using the cache api? It sounds like it.
This post, which pointed me to these slides helped me create a nice generational caching system which let me create the hierarchy I wanted.
In short, you store a generation key (such as group) in your cache and incorporate the value stored into your key creation function so that you can invalidate a whole set of keys at once.
With this basic concept you could create highly complex hierarchies or just a simple group system.
For example:
class Cache(object):
def generate_cache_key(self, key, group=None):
"""
Generate a cache key relating them via an outside source (group)
Generates key such as 'group-1:KEY-your-key-here'
Note: consider this pseudo code and definitely incomplete code.
"""
key_fragments = [('key', key)]
if group:
key_fragments.append((group, cache.get(group, '1')))
combined_key = ":".join(['%s-%s' % (name, value) for name, value in key_fragments)
hashed_key = md5(combined_key).hexdigest()
return hashed_key
def increment_group(self, group):
"""
Invalidate an entire group
"""
cache.incr(group)
def set(self, key, value, group=None):
key = self.generate_cache_key(key, group)
cache.set(key, value)
def get(self, key, group=None):
key = self.generate_cache_key(key, group)
return cache.get(key)
# example
>>> cache = Cache()
>>> cache.set('key', 'value', 'somehow_related')
>>> cache.set('key2', 'value2', 'somehow_related')
>>> cache.increment_group('somehow_related')
>>> cache.get('key') # both invalidated
>>> cache.get('key2') # both invalidated
Caching a dict or something serialised (with JSON or the like) sounds good to me. The cache backends are key-value stores like memcache, they aren't hierarchical.

Why does Django give me different results for the same query?

For a mock web service I wrote a little Django app, that serves as a web API, which my android application queries. When I make requests tp the API, I am also able to hand over an offset and limit to only have the really necessary data transmitted. Anyway, I ran into the problem, that Django gives me different results for the same query to the API. It seems as if the results are returned round robin.
This is the Django code that will be run:
def getMetaForCategory(request, offset, limit):
if request.method == "GET":
result = { "meta_information": [] }
categoryIDs = request.GET.getlist("category_ids[]")
categorySet = set(toInt(categoryIDs))
categories = Category.objects.filter(id__in = categoryIDs)
metaSet = set([])
for category in categories:
metaSet = metaSet | set(category.meta_information.all())
metaList = list(metaSet)
metaList.sort()
for meta in metaList[int(offset):int(limit)]:
relatedCategoryIDs = getIDs(meta.category_set.all())
item = {
"_id": meta.id,
"name": meta.name,
"type": meta.type,
"categories": list(categorySet & set(relatedCategoryIDs))
}
result['meta_information'].append(item)
return HttpResponse(content = simplejson.dumps(result), mimetype = "application/json")
else:
return HttpResponse(status = 403)
What happens is the following: If all MetaInformation objects would be Foo, Bar, Baz and Blib and I would set the limit to 0:2, then I would get [Foo, Bar] with the first request and with the exact same request the method would return [Baz, Blib] when I run it for the second time.
Does anyone see what I am doing wrong here? Or is it the Django cache that somehow gets into my way?
I think the difficulty is that you are using a set to store your objects, and slicing that - and sets have no ordering (they are like dictionaries in that way). So, the results from your query are in fact indeterminate.
There are various implementations of ordered sets around - you could look into using one of them. However, I must say that I think you are doing a lot of unnecessary and expensive unique-ifying and sorting in Python, when most of this could be done directly by the database. For instance, you seem to be trying to get the unique list of Metas that are related to the categories you pass. Well, this could be done in a single ORM query:
meta_list = MetaInformation.objects.filter(category__id__in=categoryIDs)
and you could then drop the set, looping and sorting commands.