incr operation for a key of memcache is getting reset in every hour - python-2.7

I'm using memcache in gae from app engine api, it's documentation doesn't give any info for setting expiry time, but it's getting reset in every hour.
https://cloud.google.com/appengine/docs/standard/python/refdocs/google.appengine.api.memcache.html#google.appengine.api.memcache.Client.incr
from google.appengine.api import memcache
def count(key):
newVal = memcache.incr(key,delta=1,initial_value=1)
return newVal
I want the value to persist for 2 days, How can I achieve the same ?

Memcached is an in memory cache. You can never be certain that an object in cache will remain in cache.
There are many reasons why this could happen.
One your cache is full and someone tries to insert an object, memcache will evict the one that is least used
There is a concept of slabs in memcache where similar sized object are kept in one slab. It can be that your cache is not full but your slab is and that is evicting the object. You can read more about it here
One more reason could be that in newer versions of memcache (I think after 1.5.0) memcache has started to evict slabs when they are not used for a long time and a new slab requires space.
The bottom line is that you should not rely on memcached to store your data. Its best to have a source of your data as some database and use memcached as cache only.
Hope this helps

Related

Is retrieving data from a redis instance slower than retreiving the same from Django's request.session dictionary?

In a Python/Django application, is retrieving a value stored in redis slower than retrieving one stored in the request.session dictionary?
Background:
I have a Django app where I use DB-based sessions. I.e., instead of using django.contrib.sessions, I used this nifty little 3rd party library.
I recently ran a benchmark whereby I saved a test value in a local redis instance via the redis-py wrapper (i.e. my_server.set('test','1')). I saved the same test value in request.session['test'].
I then retrieved the test value from each, and compared the time taken. request.session out performed redis by a factor exceeding 2x in this scenario.
Problem:
The application is not distributed in any way, everything is shared and happens on the same machine - very vanilla set up.
The result appears counter-intuitive to me. Why? Because my sessions are DB based, and I thought redis would be faster than whatever Django has to offer. Clearly, I am wrong.
Can an expert explain what's actually going on here? Maybe the python wrapper on redis' core API is slowing things down?
In case you need more information, or are skeptical about how I ran the benchmark, please do ask.
P.s. I simply put the two competing ways in a for loop for 100K iterations and measured the time taken to complete.
The session is stored as a single blob, not as individual keys. It has almost certainly already been loaded and decoded by the time you get into your view, most likely by the auth middleware. Once it is loaded it is stored locally as a dictionary, which is all that your timing tests will measure.

Advice on caching for Django/Postgres application

I am building a Django web application and I'd like some advice on caching. I know very little about caching. I've read the caching chapter in the Django book, but am struggling to relate it to my real-world situation.
My application will be a web front-end on a Postgres database containing a largeish amount of data (150GB of server logs).
The database is read-only: the purpose of the application is to give users a simple way to query the data. For example, the user might ask for all rows from server X between dates A and B.
So my database needs to support very fast read operations, but it doesn't need to worry about write operations (much - I'll add new data once every few months, and it doesn't matter how long that takes).
It would be nice if clients making the same request could use a cache, rather than making another call to the Postgres database.
But I don't know what sort of cache I should be looking at: a web cache, or a database cache. Or even if Postgres is the best choice (I'd just like to use it because it works well with Django, and is so robust). Could anyone advise?
The Django book says memcached is the best cache with Django, but it runs in memory, and the results of some of these queries could be several GB, so memcached might fill up the machine's memory quickly. But perhaps I don't fully understand how memcached operates.
Your query should in no way return several GB of data. There's no practical reason to do so, as the user cannot absorb that much data at a time. Your result set should be paged, such that the user sees only 10, 25, whatever results at a time. That then allows you to also limit your query to only fetch 10, 25, whatever records at a time starting from a particular index based on the page number.
Caching search result pages is not a particularly good idea, regardless, though. For one, the odds that different users will ever conduct exactly the same search are pretty minimal, and you'll end up wasting RAM to cache result sets that will never be used again. Also, something like logs should be real-time. If you return a cached result set, there might be new, relevant results that are not included, obscuring the usefulness of your search.
As mentioned above you have limitations on what problems caching can solve. As you are building this application, then I see no reason why you couldn't just plug in Django Haystack and Whoosh and see how it performs, then switching to some of the other more Enterprise search backends is a breeze.

Using memcached with a dynamic django backend

My Django backend is always dynamic. It serves an iOS app similar to that of Instagram and Vine where users upload photos/videos and their followers can comment and like the content. Just for the sake of this question, imagine my backend serves an iOS app that is exactly like Instagram.
Many sources claim that using memcached can improve performance because it decreases the amount of hits that are made to the database.
My question is, for a backend that is already in dynamic in nature (always changing since users are uploading new pictures, commenting, liking, following new users etc..) what can I possibly cache?
It's a problem I've been thinking about for quite some time. I could cache the user profile data, but other than that, I don't know where else memcached would be useful.
Other sources mentioned using it everywhere in the backend where a 'GET' call is made but then I would need to set a suitable time limit to expire the cache since the app is always dynamic. What are your solutions and suggestions for getting around this problem?
You would cache whatever is being most frequently accessed from your Database. Make a list of the most frequent requests to get data from the database and cache the data in that priority.
Cache the most frequent requests based on category of the pictures
Cache based on users - power users go into cache (those which do a lot of data access)
Cache the most recent inserts (in case you have a page which shows the recently added posts/pictures)
I am sure you can come up with more scenarios. I am positive memcached (or any other caching) will help, even though your app is very 'dynamic'.

Designing backend software for multiplayer cross platform app

I am currently in the initial design phase of my first app.
In my app there will be individual sessions containing 1-5 users.
I need to be able to keep track of each users gps location and be able to push and pull them to each of the users. Each user will have the most recently reported location of every other user in the session.
There will be other calculations done on the data set but that will be client side, the server should only need to handle pushing and pulling of user locations (and the usernames).
I'm predicting due to the nature of the app 90% of sessions should not last more than 2 hours, with the possibility of the server ending sessions that are older then 24-48 hours (once real world testing of the app begins I would have a better idea of how long sessions should last).
I was thinking of using django to build an API, and to store all the data in the program itself and not to use a database as this should be faster and I don't think it is necessary to store the data since it has such a short lifetime.
Is this a good starting point? Is there anything I should be thinking about or considering? I'm completely new to designing backend software.
While performance might not even be an issue in the beginning, there are some things you can do once you hit a certain load:
Keep all your session data in one model, even if you're denormalizing (putting redundant information into your database) your database a bit. That way you only have to do one read to the database and no expensive JOINs
Use the Django caching framework (https://docs.djangoproject.com/en/dev/topics/cache/) to cache views, so multiple reads of the same data don't have to hit the database
Before you start optimizing, profile your code to see where your performance bottlenecks really are. Sometimes you'll be surprised which operations are expensive, and which aren't.

How to cache MySQL table in C++ web service

I've got a big users table that I'm caching in a C++ web service (BitTorrent tracker). The entire table is refetched every 5 minutes. This has some drawbacks, like data being up to 5 minutes old and refetching lots of data that hasn't changed.
Is there a simple way to fetch just the changes since last time?
Ideally I'd not have to change the queries that update the data.
Two immediate possibilities come to me:
MySQL Query Cache
Memcached (or similar) Caching Layer
I would try the query cache first as it likely is far easier to setup. Do some basic tests/benchmarks and see if it fits your needs. Memcached will likely be very similar to your existing cache but, as you mention, you'll to find a better way of invalidating stale cache entries (something that the query cache does for you).