Rails Expire action using regex expression with Memcache - regex

Iam working on a Rails application and have integrated caching with memcache using Dalli. Iam working on action caching and expiring cache using sweepers.
My sweeper code looks like this:
class BoardSweeper < ActionController::Caching::Sweeper
observe Board
def after_create(board)
expire_cache(board)
end
def expire_cache(board)
expire_action :controller => :boards, :action => :show, :id => "#{board.id}-#{board.title.parameterize}"
end
end
But I want to delete the cache using regex expression i.e. I want to match the url and delete the cache just like:
If my board show url's are like:
"boards/1/likes-collection-of-branded-products.text/javascript"
"boards/1/likes-collection-of-branded-products.text/html"
Then I want to use the following expression to expire the cache:
Rails.cache.delete_matched(/boards\/1.*/)
But as per the memcache api doc it doesnt support delete_matched method.
Iam sure there should be some way to delete on basis of regex. Please help.
Many Thanks!!

afaik the problem with memcached is that there is no simple way of retrieving all keys. that's why there is no such functionality as expiring based on a regular expression.
what you could do:
use a naming convention for your cache keys and simply expire all cache keys that you know of, that might have been created.
this would impose a little overhead by expiring keys that have not been created.
overall, i would not advise you using action-caches. there are good reasons that those got excluded from rails4.

You could try Gibson cache server as your primary cache store, it supports multiple keys invalidation by prefix in O(1) time due to its internal data structure, in your case it would be just a matter of:
Rails.cache.delete_matched "boards/1/"
An ActiveSupport extension is being developed and will be released in about a week.

Related

Path based AWS API Caching Keys Issue

I have several API paths set up in a test API Gateway setup with a simple 'api' stage. I am using AWS Lambda and wish to cache the results of the lambda call.
There are three test paths (no authentication)
/a/{thing} (GET Caching turned on in stage)
/b/{thing} (GET Caching turned off in stage)
/c/{thing} (GET Caching turned off in stage)
They all map to the same lambda function. The lambda function returns the current time and the value of {thing}.
If I request /a/0000 through /a/1000 I get back the same result for a function that ran for thing=0000.
If I request /b/0000 through /b/1000 (or /c/) I get back uncached results.
thing is selected as 'cache' in resources /a/{thing}. Nothing else is set 'cache'.
It is my understanding that selecting 'cache' next to a path element, query element, or header would construct a cache key - possibly a multi-key cache key hash. That would be ideal!
Ideally /a/0000 and /a/1234 would return a cached version keyed to the {thing} value.
What did I do wrong or misread or step over? Am I hitting a bug when it comes to AWS Lambda? Is caching keyed to authorization - these URLs are public and unauthenticated. I'm just using curl to request these and nothing is being cached on the client side of course.
Honestly. I've also tried using a query argument as the only cache key and let the cache flush and waited 30 minutes to try try try again. Still not giving the results I would expect.
Pro Tip:
You still have to deploy from resources to stage when you set up cache keys. This makes sense of course but it would be good if the management console showed more about the method parameters than it does.
I am using Chalice.. which is why I wasn't deploying in the normal fashion.

Google Cloud CDN started ignoring query strings for storage buckets

Some months ago activated Cloud CDN for storage buckets. Our storage data is regularly changed via a backend. So to invalidate the cached version we added a query param with the changedDate to the url that is served to the client.
Back then this worked well.
Sometime in the last months (probably weeks) Google seemed to change that and is now ignoring the query string for caching from storage buckets.
First part: Does anyone know why this is changed and why noone was
notified about it?
Second part: How can you invalidate the Cache for a particular object
in a storage bucket without sending a cache-invalidation request
(which you shouldn't) everytime?
I don't like the idea of deleting the old file and uploading a new file with changed filename everytime something is uploaded...
EDIT:
for clarification: the official docu ( cloud.google.com/cdn/docs/caching ) already states that they now ignore query strings for storage buckets:
For backend buckets, the cache key consists of the URI without the query > string. Thus https://example.com/images/cat.jpg, https://example.com/images/cat.jpg?user=user1, and https://example.com/images/cat.jpg?user=user2 are equivalent.
We were affected by this also. After contacting Google Support, they have confirmed this is a permanent change. The recommended work around is to either use versioning in the object name, or use cache invalidation. The latter sounds a bit odd as the cache invalidation documentation states:
Invalidation is intended for use in exceptional circumstances, not as part of your normal workflow.
For backend buckets, the cache key consists of the URI without the query string, as the official documentation states.1 The bucket is not evaluating the query string but the CDN should still do that. I could reproduce this same scenario and currently is still possible to use a query string as cache buster.
Seems like the reason for the change is that the old behavior resulted in lost caching opportunities, higher costs and higher latency. The only recommended workaround for now is to create the new objects by incorporating the version into the object's name (which seems is not valid options for your case), or using cache invalidation.
Invalidating the cache for a particular object will require to use a particular query. Maybe a Cache-Control header allowing such objects to be cached for a certain time may be your workaround. Cloud CDN cache has an expiration time defined by the "Cache-Control: s-maxage", "Cache-Control: max-age", and/or Expires headers 2.
According to the doc, when using backend bucket as origin for Cloud CDN, query strings in the request URL are not included in the cache key:
For backend buckets, the cache key consists of the URI without the protocol, host, or query string.
Maybe using the query string to identify different versions of cached content is not the best practices promoted by GCP. But for some legacy issues, it has to be.
So, one way to workaround this is make backend bucket to be a static website (do NOT enable CDN here), then use custom origins (Cloud CDN backed by Internet network endpoint groups backend service) which points to that static website.
For backend service, query string IS part of cache key.
For backend services, Cloud CDN defaults to using the complete request URI as the cache key
That's it. Yes, It is tedious but works!

Elasticsearch self.published?

I am using elasticsearch-rails gem For my site i need to create custom callbacks. https://github.com/elastic/elasticsearch-rails/tree/master/elasticsearch-model#custom-callbacks
But i really confused by one thing. What means if self.published? on this code?
i try to use this for my models
after_commit on: [:update] do
place.__elasticsearch__.update_document if self.published?
end
but for model in console i see self.published? => false but i don`t know what this means
From the document of elasticsearch-rails.
For ActiveRecord-based models, use the after_commit callback to protect your data against inconsistencies caused by transaction rollbacks:
I think it was used to make sure everything is updated successfully into database before we sync to elasticsearch server

How can I schedule the invalidation of a redis cache?

I am using django as a framework to build a content management system for a site with a blog.
Each blog post will have a route that contains a unique identifier for the blog post. These blog posts can be scheduled and have an expiry date. This means that the routes have to be dynamic.
The entire site needs to be cached and we have redis set up as a back end cache. We currently cache rendered pages against out static routes, but need to find a way of caching pages against the dynamic routes (and invalidating them when the blog posts expire.)
I could use a cron job but it isn't appropriate because...
a) New blog posts go live rarely and not periodically
b) Users can schedule posts to the minute. This means that a cron job would have to run every minute which seems like overkill!
I've just found the django-cacheops library, which seems to do exactly what I need (schedule the invalidation of our cache and invalidate them via signals). Is this compatible with our existing setup and how easy is the setup?
I assume this is a pretty common problem - does anyone have any better ideas than the above?
I can't comment on django-cacheops because I've never used it, but Redis provides a really easy way to do this using the EXPIRE command:
Set a timeout on key. After the timeout has expired, the key will automatically be deleted.
Usage:
SET some_key "some_value"
EXPIRE some_key 10
The key some_key will now automatically be cleaned/deleted by Redis in 10 seconds. If you need to delete blog posts' cache knowing when they should be deleted from the outset, this should serve your needs perfectly.
Cacheops invalidate cache when a post is changed, that's its primary use. But you can also expire by timeout:
from cacheops import cached_as, cached_view_as
# A queryset
post = Post.objects.cache(timeout=your_timeout).get(pk=post_pk)
# A function
#cached_as(Post.objects.filter(pk=post_pk), timeout=your_timeout)
def get_post_data(...):
...
# A view
#cached_view_as(Post, timeout=your_timeout)
def post(request, ...):
...
However, there is currently no way you can specify timeout depending on cached object.

Updating a hit counter when an image is accessed in Django

I am working on doing some simple analytics on a Django webstite (v1.4.1). Seeing as this data will be gathered on pretty much every server request, I figured the right way to do this would be with a piece of custom middleware.
One important metric for the site is how often given images are accessed. Since each image is its own object, I thought about using django-hitcount, but figured that was unnecessary for what I was trying to do. If it proves easier, I may use it though.
The current conundrum I face is that I don't want to query the database and look for a given object for every HttpRequest that occurs. Instead, I would like to wait until a successful response (indicated by an HttpResponse.status of 200 or whatever), and then query the server and update a hit field for the corresponding image. The reason the only way to access the path of the image is in process_request, while the only way to access the status code is in process_response.
So, what do I do? Is it as simple as creating a class variable that can hold the path and then lookup the file once the response code of 200 is returned, or should I just use django-hitcount?
Thanks for your help
Set up a cron task to parse your Apache/Nginx/whatever access logs on a regular basis, perhaps with something like pylogsparser.
You could use memcache to store the counters and then periodically persist them to the database. There are risks that memcache will evict the value before it's been persisted but this could be acceptable to you.
This article provides more information and highlights a risk arising when using hosted memcache with keys distributed over multiple servers. http://bjk5.com/post/36567537399/dangers-of-using-memcache-counters-for-a-b-tests