Appfabric local cache notifications - appfabric

I want to see if my understanding of Appfabric local cache invalidation is correct
Assume I have notification based invalidation set up on my local cache
The default polling interval is 5 minutes
Which way does the polling occur? I believe the local cache polls the distributed cache to check for notifications, is this correct?
Does that mean that if a change occurs to the distributed cache it could be anywhere up to 5 minutes before that item in the local cache is invalidated depending on when the last sync occurred?
Is there any way to see the last synched time, through powershell or another mechanism?

Yes, local polls server each pollInterval. The interval can be customized.
Yes, that's correct
Doubt about powershell. Maybe there will be some trace events in case you use Set-CacheLogging but I didn't try. What will definitely work is to subscribe to cache notifications right in the code and put a breakpoint into it.

Related

Being notified when CloudFront has completed deployment

Is it possible to get notified when CloudFront distribution has been deployed? Or more generally when the Status (and possibly State) has changed?
In principle I could create a periodic (say, 15 minutes) checker that would poll the distribution list and see whether any property has changed, but considering most of the time this doesn't really change it feels wasteful, plus it is more work (I need to remember the previous state, so the check cannot be stateless).
So a push-based (where AWS notifies me, instead of me asking) solution is preferred.
CloudFront does not provide any push notifications for distribution configuration or state changes.

Amazon Elasticache Failover

We have been using AWS Elasticache for about 6 months now without any issues. Every night we have a Java app that runs which will flush DB 0 of our redis cache and then repopulate it with updated data. However we had 3 instances between July 31 and August 5 where our DB was successfully flushed and then we were not able to write the new data to the database.
We were getting the following exception in our application:
redis.clients.jedis.exceptions.JedisDataException:
redis.clients.jedis.exceptions.JedisDataException: READONLY You can't
write against a read only slave.
When we look at the cache events in Elasticache we can see
Failover from master node prod-redis-001 to replica node
prod-redis-002 completed
We have not been able to diagnose the issue and since the app was running fine for the past 6 months I am wondering if it is something related to a recent Elasticache release that was done on the 30th of June.
https://aws.amazon.com/releasenotes/Amazon-ElastiCache
We have always been writing to our master node and we only have 1 replica node.
If someone could offer any insight it would be much appreciated.
EDIT: This seems to be an intermittent problem. Some days it will fail other days it runs fine.
We have been in contact with AWS support for the past few weeks and this is what we have found.
Most Redis requests are synchronous including the flush so it will block all other requests. In our case we are actually flushing 19m keys and it takes more then 30 seconds.
Elasticache performs a health check periodically and since the flush is running the health check will be blocked, thus causing a failover.
We have been asking the support team how often the health check is performed so we can get an idea of why our flush is only causing a failover 3-4 times a week. The best answer we can get is "We think its every 30 seconds". However our flush consistently takes more then 30 seconds and doesn't consistently fail.
They said that they may implement the ability to configure the timing of the health check however they said this would not be done anytime soon.
The best advice they could give us is:
1) Create a completely new cluster for loading the new data on, and
instead of flushing the previous cluster, re-point your application(s)
to the new cluster, and remove the old one.
2) If the data that you are flushing is an update version of the data,
consider not flushing, but updating and overwriting new keys?
3) Instead of flushing the data, set the expiry of the items to be
when you would normally flush, and let the keys be reclaimed (possibly
with a random time to avoid thundering herd issues), and then reload
the data.
Hope this helps :)
Currently for Redis versions from 6.2 AWS ElastiCache has a new feature of thread monitoring. So the health check doesn't happen in the same thread as all other actions of Redis. Redis can continue to proceed a long command / lua script, but will still considered healthy. Because of this new feature failovers should happen less.

AppFabric Syncing Local Caches

We have a very simple AppFabric setup where there are two clients -- lets call them Server A and Server B. Server A is also the lead cache host, and both Server A and B have a local cache enabled. We'd like to be able to make an update to an item from server B and have that change propagate to the local cache of Server A within 30 seconds (for example).
As I understand it, there appears to be two different ways of getting changes propagated to the client:
Set a timeout on the client cache to evict items every X seconds. On next request for the item it will get the item from the host cache since the local cache doesn't have the item
Enable notifications and effectively subscribe to get updates from the cache host
If my requirement is to get updates to all clients within 30 seconds then setting a timeout of less than 30 seconds on the local cache appears to be the only choice if going with option #1 above. Due to the size of the cache, this would be inefficient to evict all of the cache (99.99% of which probably hasn't changed in the last 30 seconds).
I think what we need to implement is option #2 above, but I'm not sure I understand how this works. I've read all of the msdn documentation (http://msdn.microsoft.com/en-us/library/ee808091.aspx) and have looked at some examples but it is still unclear to me whether it is really necessary to write custom code or if this is only if you want to do extra handling.
So my question is: is it necessary to add code to your existing application if want to have updates propagated to all local caches via notifications, or is the callback feature just an bonus way of adding extra handling or code if a notification is pushed down? Can I just enable Notifications and set the appropriate polling interval at the client and things will just work?
It seems like the default behavior (when Notifications are enabled) should be to pull down fresh items automatically at each polling interval.
I ran some tests and am happy to say that you do NOT need to write any code to ensure that all clients are kept in sync. If you set the following as a child element of the cluster config:
In the client config you need to set sync="NotificationBased" on the element.
The element in the client config will tell the client how often it should check for new notifications on the server. In this case, every 15 seconds the client will check for notifications and pull down any items that have changed.
I'm guessing the callback logic that you can add to your app is just in case you want to add your own special logic (like emailing the president every time an item changes in the cache).

Redis is taking too long to respond

Experiencing very high response latency with Redis, to the point of not being able to output information when using the info command through redis-cli.
This server handles requests from around 200 concurrent processes but it does not store too much information (at least to our knowledge). When the server is responsive, the info command reports used memory around 20 - 30 MB.
When running top on the server, during periods of high response latency, CPU usage hovers around 95 - 100%.
What are some possible causes for this kind of behavior?
It is difficult to propose an explanation only based on the provided data, but here is my guess. I suppose that you have already checked the obvious latency sources (the ones linked to persistence), that no Redis command is hogging the CPU in the slow log, and that the size of the job data pickled by Python-rq is not huge.
According to the documentation, Python-rq inserts the jobs into Redis as hash objects, and let Redis expires the related keys (500 seconds seems to be the default value) to get rid of the jobs. If you have some serious throughput, at a point, you will have many items in Redis waiting to be expired. Their number will be high compared to the pending jobs.
You can check this point by looking at the number of items to be expired in the result of the INFO command.
Redis expiration is based on a lazy mechanism (applied when a key is accessed), and a active mechanism based on key sampling, which is run in the event loop (in pseudo background mode, every 100 ms). The point is when the active expiration mechanism is running, no Redis command can be processed.
To avoid impacting the performance of the client applications too much, only a limited number of keys are processed each time the active mechanism is triggered (by default, 10 keys). However, if more than 25% keys are found to be expired, it tries to expire more keys and loops. This is the way this probabilistic algorithm automatically adapt its activity to the number of keys Redis has to expire.
When many keys are to be expired, this adaptive algorithm can impact the performance of Redis significantly though. You can find more information here.
My suggestion would be to try to prevent Python-rq to delegate item cleaning to Redis by setting expiration. This is a poor design for a queuing system anyway.
I think reduce ttl should not be the right way to avoid CPU usage when Redis expire keys.
Didier says, with a good point, that the current architecture of Python-rq that it delegates the cleaning jobs to Redis by using the key-expire feature. And surely, like Didier said it is not the best way. ( this is used only when result_ttl is greater than 0 )
Then the problem should rise when you have a set of keys/jobs with a expiration dates near one of the other, and it could be done when you have a bursts of job creation.
But Python-rq sets expire key when the job has been finished in one worker,
Then it doesn't have too sense, because the keys should spread around the time with enough time between them to avoid this situation

Windows Server AppFabric Cache Time-out based invalidation callback

I am using Windows Server AppFabric Caching in our application with local cache enabled.
This is configured as following:
<localCache isEnabled="true" sync="TimeoutBased" objectCount="1000" ttlValue="120"/>
I have setup time-out based invalidation with time-out interval of 120 seconds.
As per this configuration, local cache will remove items from in-memory cache after every 120 seconds and retrieve item from cache cluster. Is it possible to add a callback which gets fired whenever local cache tries to hit the cache cluster to retrieve items instead of fetching them locally?
Unfortunately, there is no way to know if data is fetched locally or not. There are cache server notifications but they are not reliable.
In your scenario, a good approach could be the Read-Through and Write-Behind feature. It does not fit to all situations but your can take a quick look.
Here are some links :
http://msdn.microsoft.com/en-us/library/hh377669.aspx
http://blogs.msdn.com/b/prathul/archive/2011/12/06/appfabric-cache-read-from-amp-write-to-database-read-through-write-behind.aspx