I want to implement some caching with Flask-Caching. In particular I want to cache API response since the frequency of data update is very rare, less than once per day.
I can't figure how to implement cache based on user. The API response include some information that is calculated based on the user is requesting the info.
Which is the right way to implement cache per user and not generic?
Related
I'm looking for a convenient way to set up my apollo hooks to query the cache first and if the cache is empty, to make the endpoint call.
I can't seem to find the right documentation for this, I have only found how to query the cache and do a normal query.
Apollo Client allows you to specify the fetchPolicy for an individual query made using useQuery. Possible values include:
cache-first: This is the default value where we always try reading data from your cache first. If all the data needed to fulfill your query is in the cache then that data will be returned. Apollo will only fetch from the network if a cached result is not available. This fetch policy aims to minimize the number of network requests sent when rendering your component.
cache-and-network: This fetch policy will have Apollo first trying to read data from your cache. If all the data needed to fulfill your query is in the cache then that data will be returned. However, regardless of whether or not the full data is in your cache this fetchPolicy will always execute query with the network interface unlike cache-first which will only execute your query if the query data is not in your cache. This fetch policy optimizes for users getting a quick response while also trying to keep cached data consistent with your server data at the cost of extra network requests.
network-only: This fetch policy will never return you initial data from the cache. Instead it will always make a request using your network interface to the server. This fetch policy optimizes for data consistency with the server, but at the cost of an instant response to the user when one is available.
cache-only: This fetch policy will never execute a query using your network interface. Instead it will always try reading from the cache. If the data for your query does not exist in the cache then an error will be thrown. This fetch policy allows you to only interact with data in your local client cache without making any network requests which keeps your component fast, but means your local data might not be consistent with what is on the server. If you are interested in only interacting with data in your Apollo Client cache also be sure to look at the readQuery() and readFragment() methods available to you on your ApolloClient instance.
no-cache: This fetch policy will never return your initial data from the cache. Instead it will always make a request using your network interface to the server. Unlike the network-only policy, it also will not write any data to the cache after the query completes.
Since cache-first is the default, there's nothing else you need to do with your hook to get the desired behavior.
I have a API build in django, it goes to query data with PostgreSQL and serializer to json.
Now, this API content-length is 3MB (I use the gzip, the truely size is 20MB) response time cost me about 10~20 seconds.
I want to ask, Is this performance is right? there is any optimization space that I can do?
If it takes 10-20 seconds, it sounds more like poor API design, however I don't know your use-case so I can't be sure.
Check if you can do any of the recommendations of Ken. Here are some more ideas:
Pagination is a great way to split data - if your data can be split into logical parts, DRF has a great number of ways to paginate querysets.
Using the right gzip compression level could be a factor, given the size of your data. Read more about it here
See if you can use etag option, where the server sends a 304 Not Modified if the API response has not changed between two consecutive calls to the API. DRF does not support etags out-of-the-box AFAIK, so you will have to find a work-around.
Since you mentioned "real-time" data, I am assuming there's some concept of streaming of temporal data. There are interesting ways to do combine client caching + cursor based pagination to only send the 'new data' you can explore. This only works if the two preconditions are met: The changes to your API are incremental and temporal in nature.
I'm retreiving event data using Real Time Google Analytics API, so as to trigger responses each time conditions are met - while the user navigates.
This is my actual query on Google Analytics Real Time API (which works perfectly!)
return service.data().realtime().get(
ids='ga:' + profile_id,
metrics='rt:totalEvents',
dimensions='rt:eventAction,rt:eventLabel,rt:eventCategory',
max_results='25').execute()
I'd like to show results grouped by each particular session or user. So as to trigger a message to this particular user if some conditions are met.
Is that possible? And if so, how do apply this criteria to this query?
"Trigger a message to a particular user" would imply that you either have personally identifiable data stored in GA, which would violate Googles TOS, or that you map an anonymous ID (clientid or UserID or similar) to a key stored in an external database (which might be legally murky, depending on your legislation). Since I don't want to throw away the answer I have written before reading your question to the end :-) I am going to assume the latter.
So, is that possible? No, not really. By default GA does not identify neither an identifier for the user (client id or user id) nor for the session (a session identifier is present only in the BigQuery export schema).
The realtime API has a very limited set of dimensions (mostly I think because data aggregation does not happen in realtime), so you can't even use custom dimensions. Your only chance would be to overwrite one of the standard fields, i.e. campaign information.
Of course this destroys the original data in the field. So you should use an extra view for the API query, send a custom dimension with the user identifier along, and then use an advanced filter to copy the custom dimension value to a standard field (while you original data is safe in your other data views). This is a bit hackish, though.
Also the realtime API only displays the current hit per user, so you cannot group by user in the query in any case - you'd need to download and store the data to an external database and do your aggregation there.
I'm currently working on an Api build with Django Rest Framework, uwsgi nginx and memcached.
I would like to know what is the best way to get users statistics like number of requests per user? Taking in consideration that the infrastructure is probably going to scale to multiple servers.
And is there is a way to determine if the response was retrieve from cache or from the application?
What I'm thinking is processing the Nginx logs to separate the request by user and apply all the calculations.
First: You might find drf-tracking to be a useful project, but it stores the response to every request in the database, which we found kind of crazy.
The solution for this that we developed instead is a a mixin that borrows heavily from drf-tracking, but that only logs statistics. This solution uses our cache server (Redis), so it's very fast.
It's very simple to drop in if you're already using Redis:
class LoggingMixin(object):
"""Log requests to Redis
This draws inspiration from the code that can be found at: https://github.com/aschn/drf-tracking/blob/master/rest_framework_tracking/mixins.py
The big distinctions, however, are that this code uses Redis for greater
speed, and that it logs significantly less information.
We want to know:
- How many queries in last X days, total?
- How many queries ever, total?
- How many queries total made by user X?
- How many queries per day made by user X?
"""
def initial(self, request, *args, **kwargs):
super(LoggingMixin, self).initial(request, *args, **kwargs)
d = date.today().isoformat()
user = request.user
endpoint = request.resolver_match.url_name
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['STATS'],
)
pipe = r.pipeline()
# Global and daily tallies for all URLs.
pipe.incr('api:v3.count')
pipe.incr('api:v3.d:%s.count' % d)
# Use a sorted set to store the user stats, with the score representing
# the number of queries the user made total or on a given day.
pipe.zincrby('api:v3.user.counts', user.pk)
pipe.zincrby('api:v3.user.d:%s.counts' % d, user.pk)
# Use a sorted set to store all the endpoints with score representing
# the number of queries the endpoint received total or on a given day.
pipe.zincrby('api:v3.endpoint.counts', endpoint)
pipe.zincrby('api:v3.endpoint.d:%s.counts' % d, endpoint)
pipe.execute()
Put that somewhere in your project, and then add the mixin to your various views, like so:
class ThingViewSet(LoggingMixin, viewsets.ModelViewSet):
# More stuff here.
Some notes about the class:
It uses Redis pipelines to make all of the Redis queries hit the server with one request instead of six.
It uses Sorted Sets to keep track of which endpoints in your API are being used the most and which users are using the API the most.
It creates a few new keys in your cache per day -- there might be better ways to do this, but I can't find any.
This should be a fairly flexible starting point for logging the API.
What I do usually is that I have a centralized cache server (Redis), I log all requests to there with all the custom calculations or fields I need. Then you can build your own dashboard.
OR
Go with Logstash from the Elasticsearch company. Very well done saves you time and scales very good. I'd say give it a shot http://michael.bouvy.net/blog/en/2013/11/19/collect-visualize-your-logs-logstash-elasticsearch-redis-kibana/
I have a User model which hasMany phones. The UI for the user allows to add/delete/update phones on the single form.
When user submits the form all changes to the phone list are sent to the server with a single request.
I have extended the App.UserSerializer with custom serializeHasMany to include all the phone details in the single request.
The real problem is to sync the store state after the request is complete.
Basically I need to solve these two problems:
Remove deleted records from the store. I could not find any methods which just removes a record from a store.
Update new records with the ids generated by server. (Or just remove the new records from the store and hasMany array since response creates the dups for the added records)
Is there any best practices or work arounds for this kind of scenarios?
Thank you.
I think the best practice for now is just sticking to regular REST. In your case this will mean a few extra requests (really though, how many phones can a user have?), but it will spare you a lot of effort in handling things manually.
Ember may support bulk updates in the future (https://github.com/emberjs/data/blob/master/TRANSITION.md, "We plan to support batch saving with a single HTTP request through a dedicated API in the future.")